Skip to main content

Full text of "Numerical Methods 1 ( Iteration Programming & Algebraic Equations )"

See other formats

Numerical Methods: 1 

Programming and 
Algebraic Equations 

B. NOBLE dsc 


General Editors 

Alexander C. Aitken D'Sc FRS 

D. E. Rutherford DSc DrMath 























Volume and Integral 
Tensor Calculus 


Determinants and Matrices ... A. C. Aiikcn, D.Sc., F.R.S. 

Statistical Mathematics A. C. Aiiken, D.Sc., F.R.S. 

Integration R. P. Gillespie, Ph.D. 

Integration of Ordinary Differential Equations E. L. Incc, D.Sc. 

Vector Methods D. E. Rutherford, D.Sc., Dr.Malh. 

Theory of Equations H. W. Turnbull, F.R.S. 

Functions of a Complex Variable ... E. G. Phillips, M.A., M.Sc. 

Wanes C. A. Coulson, M.A., D.Sc., F.R.S. 

Infinite Series J. M. Hyslop, D.Sc. 

Analytical Geometry of 

Tliree Dimensions W. H. McCrea, Ph.D., F.R.S. 

Electricity C. A. Coulson, M.A.. D.Sc.. F.R.S. 

Projective Geometry T. E. Faulkner, Ph.D. 

Introduction to the Theory of 

Finite Groups W. Ledcrmann, Ph.D., D.Sc. 

Classical Mechanics ... D. E. Rutherford, D.Sc., Dr.Malh. 

Partial Differentiation R. P. Gillespie, Ph.D. 

... W. VV. Rogosinski, Dr. Phil., F.R.S. 

B. Spain. Ph.D. 

German-English Mathematical 

Vocabulary ... S. Macintyre, Ph.D., and E. Witte, M.A. 

Special Functions of Mathematical 

Physics and Chemistry I. N. Sneddon, D.Sc. 

Topology E. M. Patterson, Ph.D. 

The Theory of Ordinary 

Differential Equations 
Fluid Dynamics 
Special Relativity 

Real Variable 

Russian Reader in 

Pure and Applied Mathematics . . . 
Russian-English Mathematical 

Vocabulary J. Burlak, M.Sc., Ph.D., and K. Brooke, M.A. 
Introduction to Field Theory ... lain T. Adamson, Ph.D. 

Number Theory J. Hunter, Ph.D. 

Numerical Methods'. I. Iteration, 

Programming and Algebraic Equations ... B. Noble. D.Sc. 

Numerical Methods'. 2. Differences. 

Integration and Differential Equations B. Noble, D.Sc. 

Elementary Abstract Algebra 

E. M. Patterson, Ph.D., and D. E. Rutherford, D.Sc., Dr.Math. 
Vector Spaces of Finite Dimension G. C. Shephard, M.A., Ph.D. 
Magnetohydrodynamics A. Jeffrey, Ph.D. 

... J. C. Burkill, Sc.D, F.R.S. 
D. E. Rutherford, D.Sc., Dr.Math. 

W. Rindler, Ph.D. 

J. M. Hyslop, D.Sc. 

... P. H. Nidditch. Ph.D. 




M.A., B.Sc. D.Sc., A.M.I.E.E. 









Tweeddale Court 
Edinburgh 1 

39a Welbeck Street 
London W.l 

First published 


© 1964 B. Noble 

Primed in Great Britain by 
Oliver and Boyd Ltd., Edinburgh 


It is desirable that all students or applied mathematics, the 
physical sciences, and engineering should be given an 
introduction to numerical analysis. The aim of this book 
is to provide, for this purpose, an elementary exposition of 
certain basic ideas. This has been done by considering a 
strictly limited selection of typical methods with particular 
emphasis on fundamentals, including error analysis, error 
estimation, pitfalls, and the detection of mistakes. An 
effort has been made to avoid the common practice of 
regarding numerical methods as a collection of algorithms 
or recipes. Rather than use a given amount of space to 
describe several different methods for solving the same type 
of problem, it has been thought preferable to illustrate 
general principles by discussing one single method in some 
detail. Numerical analysis should be a source of useful 
tools but it should also make the student think. 

The advent of automatic computers has produced a 
revolution in computing. For this reason programming 
for computers has been made an integral part of the 
exposition. Although it is essential to teach the basic ideas 
of programming, the author believes that the time spent on 
the technical details in an elementary course should be cut 
to a minimum. There arc advantages in teaching a simple 
general form of automatic programming or autocode of the 
type developed in Chapter III, which need occupy no more 
than one or two lectures. The system is not applicable to 
any particular machine but it contains many of the features 
common to autocodes. If it is preferred to teach a program- 
ming system used by an existing machine, the programs in 



this book can easily be converted into the appropriate 
autocode. (The word " program " is used in the technical 
sense of a set of instructions specifying a procedure or 
programme for solving a problem on a real or hypothetical 
automatic computer.) 

An automatic computer is indispensable for any institu- 
tion concerned with the advanced teaching of scientists, but 
the place which should be occupied by computers in teaching 
elementary numerical analysis is likely to remain the subject 
of debate for many years. One can visualise courses in 
which all the practical work is done on automatic com- 
puters, and equally successful courses in which only hand 
machines are used. It is instructive and stimulating for a 
student to see a computer in operation but he need not 
become a programmer to learn about numerical analysis. 
Hand computing and automatic computing may require 
different habits of thought, but the basic principles are 
common and these can almost always be illustrated by 
simple examples. Difficulties in large-scale calculations are 
not usually due to the emergence of some radically new 
kind of error. The main point is that all students should 
have first-hand experience of numerical calculation, how- 
ever this is obtained. There is usually a " correct " answer 
to a problem in numerical mathematics in a precisely 
defined sense. The onus is on the person specifying the 
calculating procedure to ensure that the information 
obtained during the computation will indicate in what sense, 
if any, the calculation has resulted in an approximation to 
the " correct " answer. It may help to impress this upon 
the student if he sees an automatic computer produce, 
without warning, numbers which are meaningless and 
wrong, as the result of a program which he himself has 

When all computations have to be performed by hand, 
the student tends to devote a major part of his effort to the 
technical details of numerical procedures. When the 



attitude is adopted that most calculations will be performed 
by computer, the emphasis in introductory courses can be 
shifted from the details of methods to the theory behind 
classes of methods. Though the student should have some 
acquaintance with the details of typical procedures, there is 
little point in teaching complicated algorithms to students 
who will normally meet such algorithms only in standard 
computer programs devised by experts. Thus the author 
would not normally teach Bairstow's method for finding the 
complex roots of polynomials (§ 2.7) in an elementary 

Many examination questions on numerical methods are 
designed to test powers of regurgitation rather than under- 
standing. The standard questions beginning " Describe a 
method for . . ." and " Find the solution of . . ." should 
not exclude other types of question, for example : " The 
calculations described below gave the following results. . . . 
Find out whether the required solution of the problem can 
be deduced from these results. In particular estimate the 
effect of truncation error (or round-off error, or instability 
of the method, or ill-conditioning of the original problem). 
Suggest methods for checking your conclusions." 

It seems to be common practice to teach numerical 
analysis and basic mathematics in separate courses, and to 
postpone numerical analysis to a late stage —even a post- 
graduate stage — in a student's career. The author believes 
that this is wrong in principle and that much of the material 
in this book should ultimately be integrated into courses on 
basic mathematics for applied scientists. The material 
should be used to illustrate the mathematics, and to provide 
motivation and stimulation for more mathematics. 

Operator methods for the derivation of finite-difference 
formulae have been omitted completely. Experience in 
teaching indicates that students quickly acquire facility in 
obtaining complicated formulae by operators without 
having much idea of the limitations or implications of their 



results. It seems preferable in an introductory course to 
concentrate attention on the basic underlying principle, 
namely that functions arc being represented by polynomials 
through equally spaced points. Other types of functional 
representation can be introduced as natural alternatives. 

Most error estimates have been obtained in the form 
of a dominant term followed by a series of dots. Little 
attempt has been made to obtain closed error estimates by 
using a Taylor scries with a remainder term, or mean-value 
theorems, since at an elementary level this complicates the 
analysis without contributing materially to the understand- 
ing of either the method or the results. Questions of 
elegance and rigour enter into the discussion of a numerical 
method but this book is not aimed at the pure mathematician. 
In some ways numerical analysis should be approached as 
if it were an experimental science. An exact answer exists 
for any given problem, and the numerical analyst attempts 
to determine this exact answer, to a specified degree of 
accuracy, by means of suitably designed experiments. 
Mathematics is the tool, but not the aim or object, of the 

Short chapters are included on eigenvalues and partial 
differential equations. These are topics which are normally 
regarded as " advanced ", but from a numerical point of 
view they can be treated in almost as elementary a fashion 
as the solution of algebraic equations, or ordinary dif- 
ferential equations. 

For further reading it is recommended that the student 
start with Modern Computing Methods, H.M. Stationery 
Office, 2nd. Edn. (1961), which both supplements and 
complements the present text, and includes a list of papers 
and books. In view of the comprehensive bibliography in 
Modern Computing Methods it has seemed unnecessary to 
include more than a few (non-systematic) references in the 
present text. 

Unless otherwise indicated the numerical examples have 



been worked on a hand-machine, and as a general rule one 
extra decimal place has been carried in the intermediate 
working. When the student checks numerical examples 
his results may of course differ slightly from those in the 
text in the last significant digit. Many of the examples at 
the ends of chapters are intended to supplement the text, 
not to provide additional exercises. 

The origins of this book can be traced back to the 
lectures of the late D. R. Hartree, and a year spent in the 
Mathematical Laboratory, Cambridge, under M. V. 
Wilkes in 1947-48 during the exciting early days of EDSAC. 
Even though this experience failed to convert me into a 
professional computer, it taught me that numerical analysis 
is much more than an exercise in pure mathematics or a 
collection of recipes. 

The preparation of the manuscript of this book was a 
time-consuming occupation. I feel indebted to the mathe- 
matics department of the future second university in 
Glasgow, to my wife, and to all correspondents who have 
had to wait for answers to letters. They had much to 
endure while the second draft of this book was being 
typed with one finger. 

It is a pleasure to acknowledge help from many friends. 
The ideas in Chapter III on programming for automatic 
computers originated from using the system known as 
FORTRAN (FORmula TRANslalion) on the IBM 704 at 
the New York University AEC Computing Facility. I 
am grateful to E. Isaacson for encouragement to use this 
machine and to Max Goldstein for indispensable advice. 
The work was made possible by contract AF 19(604)5238 of 
the Electronics Directorate of the U.S. Air Force Cambridge 
Research Centre, which financed a visit to the Division of 
Electromagnetic Research, the Institute of Mathematical 
Sciences, New York University. A first draft of Chapter III 
was considerably altered after correspondence with T. 
Brooker and S. Gill, and a subsequent meeting with members 


of the committee on automatic programming of the British 
Computer Society, chairman C. Strachey. 

A remark of T. Brooker changed the emphasis of 
Chapter II from " solution of equations " to " iterative 
procedures ". D. J. Wheeler introduced me to the idea of 
the " noise-level " of an iterative calculation. Various 
comments in the text on the representation of functions by 
polynomials have been modified as a result of criticisms by 
G. F. Miller. Several points have been clarified by dis- 
cussions with A. J. Howie, G. N. Lance, A. van Wijngaarden 
and J. H. Wilkinson. 

Many of the numerical examples have been checked by 
A. Paton and M. V. Sweet. Numerical results have been 
contributed by: J. G. Fraser and M. V. Sweet (Ex. 10.23), 
A. Paton (Table 4.1), M. Walker (Ex. 11.8), F. J. Warner 
(Ex. 10.17), S. B. L. Wilson (the table (11.26)), C. S. Wu 
(Ex. 10.22). 

A. C. Aitken, J. G. Fraser and M. V. Sweet have spent 
much time in reading the manuscript in painstaking detail. 
Their valuable criticisms have increased the accuracy and 
readability of the text and the examples. G. Fairweather, 
R. C. Peebles and B. Shaw have assisted in proof- 
reading and contributed helpful comments. D. Greenspan 
has suggested improvements and detected inaccuracies in 
both manuscript and proofs. I am indebted to the editors, 
A. C. Aitken and D. E. Rutherford, for advice and recom- 

Above all I am grateful to Miss A. Paton. In a period 
of numerous distractions her assistance ensured a regular 
weekly production of manuscript. It is certain that without 
her help this book would never have been written. 


Mill Cottage, 
Broughton Mills, Lancs. 
July, 1962. 




I. Accuracy and Error 






Rounding off 



Absolute and relative errors 



Error analysis and control 



The evaluation of formulae on desk machines 





Examples I 


11. Iterative Methods, with Applications to 

the Solution of Equations 





A simple iterative method 



The Newton-Raphson iterative method 



General aspects of iterative procedures 



Real roots of polynomials 



Errors when finding roots of polynomials 



Bairstow's method for finding complex roots of 



Examples II 


III. Elementary Programming for Automatic 






Simple programs 



Some programs involving iterative procedures 



General comments 


Examples HI 




IV. Simultaneous Linear Algebraic Equations 

4.1 Introduction 80 

4.2 The method of successive elimination 82 

4.3 Choice of pivots and scaling 87 

4.4 Inherent error and ill-conditioned equations 92 

4.5 A computer program for the method of successive 

elimination 98 

Examples IV 100 

V. Matrix Methods 

5.1 Matrix algebra 102 

5.2 A compact elimination method for the solution 

of linear equations 105 

5.3 The inverse matrix 112 

Examples V 118 

VI. Eigenvalues and Eigenvectors 

6.1 Introduction 123 

6.2 An iterative method for finding the largest eigen- 

value 129 

6.3 The determination of subdominant eigenvalues 

and eigenvectors 135 

6.4 The iterative solution of linear simultaneous 

algebraic equations 138 

Examples VI 147 

Index 1 54 


VII. Finite Differences and the Approximate Repre- 
sentation of Functions 

VIII. Polynomial Interpolation 

IX. Numerical Integration and Differentiation 

X. Ordinary Differential Equations 

XI. Partial Differential Equations 



§ 1.1. Introduction. The solution of a problem in 
applied mathematics usually consists of numbers which 
satisfy some kind of equation. In theory these numbers 
may be specified exactly by the equation but even in the 
simplest cases (of which x = x /2 can be taken as a trivial 
example) it is normally impossible to write down an exact 
decimal representation of the solution. The aim of numerical 
methods is to provide practical procedures for calculating the 
solutions of problems in applied mathematics to a specified 
degree of accuracy. 

We shall be constantly preoccupied with the estimation 
and prevention of errors. This is merely recognition of the 
fact that to produce a number which is the answer to a 
problem is only half the aim of numerical methods. The 
other half, and often the more difficult half, is to ensure that 
the answer is correct within stated limits. Any numerical 
answer should be accompanied by a statement of ils maxi- 
mum error or its probable error. 

Errors occur in computation in four ways: 

(a) Round-off errors due to the need to represent 

numbers in decimal notation, using a finite 
number of decimal places. 

(b) Mistakes. 

(c) Errors due to the nature of the formulae used, e.g. 

the need to represent a continuous function by 
a table of numbers. 



(d) Some of the numbers which must be used in the 
computation may have inherent error, or may be 
known only approximately, e.g. physical data. 

It is a most important principle in numerical work that 
all sources of error must be borne constantly in mind. 

§ 1.2. Rounding off. The numbers 0-745 25 and 
0-007 452 5 are given to five and seven decimal places 
respectively. The number of significant figures is defined 
as the number of digits that are assumed to be correct, 
starting at the left with the first non-zero digit, and proceed- 
ing to the right. Thus if all the non-zero digits in the above 
numbers are correct, they both possess five significant 
figures. Zeros to the left of the first non-zero digit are not 
significant and serve merely to mark the decimal point. 
Zeros to the right of the last non-zero digit are regarded as 
significant. Thus 0-4300 possesses four significant figures. 
Ambiguity occurs with a number like 4300. This can be 
written 4-300 x 10 3 if it is desired to indicate that it is 
correct to four significant figures. 4-30 x 10 3 is correct to 
three significant figures. 

If only certain digits in a number arc correct the 
superfluous digits can be discarded by rounding off. The 
rule for doing this is as follows. To round off a number to 
n significant figures retain n significant digits and discard 
the remainder. If the discarded number is less than half a 
unit in the nth place leave the iith digit unchanged : if it 
is greater than half a unit in the wth place add unity to the 
;ith digit: if it is exactly half a unit round off to the nearest 
even digit. The object of the last part of the rule is to 
ensure that when exactly half a digit is discarded, numbers 
will be rounded up and rounded down in equal proportions, 
on the average. The only reason for rounding to even 
rather than to odd digits is that the resulting number is 
divisible by two without ambiguity. If 7-4525 is rounded 



off to 4, 3, 2 figures we obtain 7-452, 7-45, 7-5. But note 
that if 7-45 is rounded off to two significant figures, without 
further information, we obtain 7-4. 

§ 1.3. Absolute and relative errors. If n is an approxima- 
tion to a number N which is defined exactly, the error c of n 
is defined by N = n + e. (Some writers use the definition 
n = N+e but the distinction is not important though it is 
better to be consistent.) The absolute error is | e | and the 
(absolute) relative error is | e |/| N |. We shall be interested 
in comparatively rough estimates of the error and we assume 
that the relative error is small. In this case the relative 
error can be taken equal to | e |/| n | which has the advantage 
that the denominator is known. 

It is often possible to establish an upper limit for the 
magnitude of the error. For example if a number is given 
to exactly m decimal places then | c | g0-5 x 10~ m . We call 
this upper limit for the magnitude of the error the maximum 
error and denote it by e. Thus | e | ^e. The relative error 
is less than e/\ N\xe/\ n | (assuming that e is much less 
than | n |). 

Let 7i,, » 2 be approximations to N x , N 2 with errors 
e,, e 2 , so that 

Ni = ;ii+e,, N 2 = 71, + e,. 

Consider first addition and subtraction: 

#,+^ = (71, +71,) + (£,+£,), 
JV l -/V 2 = (77,-71 2 ) + (E 1 -E,). 

then \Ei+e 2 






.£i \<e t , 

I E i — s i l <e i + e 2- Hence if two numbers are added or 
subtracted, the maximum error in the sum or difference is 
the sum of the maximum errors in the two numbers. For 
multiplication we have 

N,N 2 = 71in 2 + 71 2 E 1 +/! 1 £ 2 + £ 1 £ 2 . 





If we denote n x ii 2 by n, and the error in n by e, 

c = #i 2 e, +ll 1 «a, i-e. (e/n) = (e,//ii) + (s 2 /M 2 ). 

where a second order term £,£2 has been neglected. Hence 
the relative error in a product is the sum of the relative 
errors in the two numbers which have been multiplied 
together. In a similar way, for division, 

N,/N 2 = (w 1 + £,)/(w 2 +fi 2 ) 

« /ir'("i+ei)(l-C2/"2) 

» (ni/n 3 )+(«i/»a)-(»i*a/ B a). 

where we have used the binomial theorem and neglected 
second-order terms. Introduce n = /j,/« 2 with error e. 

(fi/n) = (e, //;,)- (e 2 /« 2 ). 

Wc deduce from these results that for both multiplication 
and division the maximum relative error in the result is the 
sum of the maximum relative errors of the given numbers. 

Absolute error is connected with decimal places and is 
important when adding and subtracting. Relative error is 
connected with significant figures and is important when 
multiplying and dividing. 

We consider some numerical examples which illustrate 
and extend these remarks. It will be assumed that the 
numbers with which we start are correct to the last digit 
stated. Then a number with approximate representation 
3-24, say, must lie between 3-235 00... and 3-24499.... 
A convenient way of expressing this roughly is to say " 3-24 
is correct to 1 in 600 ". 

The sum 49-84+015 = 49-99 is correct to 1 in 5000 
even though one of the numbers is correct to only 1 in 30. 
The sum 0-4984+15 = 15-4984 is correct to only 1 in 30. 
The difference 49-99-49-84 = 015 is correct to only 1 in 15 
although the original numbers are both correct to 1 in 

10,000. The loss of significant figures in this way is one of 
the most insidious sources of error in numerical work. 
It is often possible to avoid errors due to the subtraction of 
two nearly equal numbers by transforming the formulae 
used in the computation, as illustrated in exs. 1.1, 1.2 

The sum S = 15-9+ 13-444+10-7246 = 400686 is 
accurate only to +005. We can discard the last two digits 
in the answer, writing S = 4007±005. It would be 
slightly misleading to write the answer as 401, but this is 
of course permissible if it is recognised that the last digit 
may be nearly one unit in error. Alternatively one could 
write S = 40-0 7 where the notation indicates that the 
number is accurate to within a few units in the depressed 
digit. We could save ourselves labour if we rounded off to 
two decimal places before adding : 5« 15-9 + 13-44+ 10-72 
= 4006. This is not strictly speaking correct since the sum 
is in fact 4007, but a unit in the second decimal place is not 
important when we know that the result may be in error by 
live units in this place. On the other hand it is not per- 
missible to write 5= 15-9 + 13-4+10-7 = 400. The errors 
we have introduced by rounding off are now comparable 
with the inherent error already present so that we have 
unnecessarily increased the error. The moral is that if we 
wish to round oft* numbers before a series of additions and 
subtractions, and the least accurate number is accurate to /; 
decimal places, we should round off to n + r decimal places 
where r is a suitable positive integer usually chosen to be 
1 or 2. 

When multiplying or dividing, if the numbers have 
nearly the same relative error the relative error in the result 
is approximately twice the relative error in the individual 
numbers. If one number is more accurate than the other, 
the relative error is approximately the same as the relative 
error in the less accurate number. Thus 3-34 2 = 11-1556 
is accurate to 1 in 300, and 3-34x3-335= 111389 is 



accurate to 1 in 600. Although the less accurate number in 
each of these examples is given to three significant figures 
it would be misleading and inaccurate to round-off the 
answers to three significant figures. 

§ 1.4. Error analysis and control. When using a desk 
calculating machine, round-off errors arc under the direct 
control of the person doing the computation. It would be 
tiresome to have to round off every single number in a 
calculation to the appropriate number of significant figures. 
On the other hand it is essential to employ round-off to 
save unnecessary labour and keep the lengths of numbers 
within reasonable limits. One difficulty is illustrated bv 
the calculation 0014 63-=- 79-3 = 0000 184 5. To obtain 
an answer correct to 1 in 1000 it is necessary to carry seven 
decimal places in the answer. On the other hand if we can 
scale our variables so that the calculation reads 1 -463 -i-O-793 
= 1-845 we need carry only three decimals. It is desirable 
to arrange, by scaling or otherwise, that in the body of the 
calculation all numbers are rounded to n+ 1 or //+ 2 decimal 
places and that this then gives an answer correct to n 
decimals. A more elaborate example of scaling is given in 
§4.1, Ex. 4.1. 

A similar situation can arise in connection with auto- 
matic digital computers. These machines perform essentially 
the same arithmetic operations as a desk calculator — they 
add, subtract, multiply and divide numbers stored inside 
the machine (see § 3.1). Analysis of round-off errors will 
depend on the way in which numbers are stored. If a 
fixed-point system is used this means that numbers are 
represented with a fixed number of decimal places e.g. 
00015, 7931482. If a floating-point system is used numbers 
are represented with a fixed number of significant figures 
e.g. 1-463 xlO -3 , 7-931 x 10 2 , where we have placed the 
decimal point immediately after the first significant digit. 
The scaling difficulty mentioned in the last paragraph may 



be particularly important if calculations have to be per- 
formed in a fixed-point system. 

In hand computing it is possible to vary the number of 
decimal places being carried in the light of the results 
obtained as the calculation proceeds in order to obtain an 
answer of the required accuracy with the minimum of 
labour. In automatic computing the number of decimals 
depends on the design of the machine and it may be difficult 
to predict beforehand the course of a calculation. It is 
often easier to make the machine compute as accurate an 
answer as possible, subject to round-off and other errors, 
and then determine the accuracy that has been achieved, 
rather than try to produce an answer of specified accuracy. 

It is important to realise that calculations can often be 
arranged so that the effect of round-off errors is minimised. 
This is illustrated in the following two examples. 

Ex. 1.1. Find the roots of x 2 -26.x + 1 = 0. 

We have x = 13±7l68 = 13± 12-961. Hence the 
roots are .v, = 25-961, x 2 = 0039. The second root has 
been obtained to only two significant figures even though 
the square root was calculated to five figures. However, 
instead of calculating the second root in this way we can use 
the relation x t x 2 = 1, so that .v 2 = 1/25-961 = 0038 519, 
in error by at most one unit in the last digit. 

Ex. 1.2. Evaluate 

c = 

b-a ' 


withb = 0-485,a = 0-231,5 = 6-327 19, A = 6-322 81, 
where the data are accurate only to the number of figures 

If we use the formula for c as it stands, it might seem 
that the result will be accurate to only three significant 
figures since the denominator is accurate to only 1 in 250. 


However, if we rearrange the formula in the form 

c = A- 




we find, on substituting numbers and carrying out a rough 
error analysis, that c = 6-318 83, correct to within two units 
in the last digit quoted. If we substitute directly in the 
original formula (1.1) and carry sufficient places in the 
intermediate working we find 

c = 1-604 982/0-254 = 6-318 83. 

This is the same answer as before, accurate to two units in 
the last digit quoted, even though we know that the 
denominator is accurate to only 1 in 250. The reason is 
clear on performing an error analysis algebraically. Suppose 
that the exact value of a is given by a + da, and similarly for 
b, A, B, c. Then on neglecting second order quantities it is 
found that 



6c = (b + 5b)(A + 5A)- (a+5a)( B+ SB) 
(b-a) + (db-5a) 

*{(B- A)(aSb - bSa) + (/> - a)(.l>5A -aS B)}(b -a)' 2 . 

Terms involving A and B have cancelled to give a final error 
which depends only on (B- A), SA, and 5B, all of which are 
small. Substitution in this formula again gives the maximum 
possible error as two units in the fifth decimal place. 

Formula (1.2) is more convenient than (1.1) since the 
error can be estimated more directly from it, and since fewer 
decimal places need be carried in the computation to obtain 
a given accuracy in the final answer. This is an important 
example of the way in which a formula can be rearranged to 
give a more suitable formula for computation. 

Error analysis in terms of maximum error, as discussed 
in the last section, is useful in avoiding particular sources 
of error, but in a long calculation an error analysis worked 



out on the basis of the maximum possible error will over- 
estimate the errors which arc likely to occur in practice by 
a very large factor. For instance if the maximum absolute 
error of a single result is e then the maximum possible error 
that can be obtained when adding n numbers is ne, though 
this is very unlikely to occur in practice. In fact if errors 
are randomly distributed between ±e, there is only one 
chance in 20 that the error in the sum of n numbers will 
exceed 2(/j/3)*e, assuming that the errors in the n numbers 
are independent. There is only one chance in 370 that the 
error will exceed 3(/;/3)*<>, and this number is much less 
than ne if/; is large. More precisely, the following results 
are given by statistical theory.f The details lie outside the 
scope of this book. If errors are distributed around zero 
according to a frequency distribution y = /(*), the variance 
of the error in a single number is defined to be 

= ' .v 2 /(.v)r/.v, 

J —00 

f(x)dx = 1 

The variance of the error in the sum of n numbers is no 2 
and the standard deviation of the error of the sum is the 
square root of the variance, or /iV. The distribution of the 
sum tends to the normal or Gaussian form as n increases. 
Thus there is only one chance in 20 of obtaining an error 
of more than twice the standard deviation. As an applica- 
tion of these results suppose that errors are uniformly 
distributed between ±0-5, so that/(.v) = 1, -0-5< v<0-5 
and /(.v) = for I x | >0-5. Then 

, J* " 5 


x 2 dx = -£. 

The standard deviation of the error of the sum of n numbers 
is therefore (n/12)*. On adding 100 numbers the standard 

t A. C. Ailken, Statistical Mathematics, Oliver and Boyd (1942) 
PP. 35, 62. 




deviation is approximately 3 and if the errors are randomly 
distributed it is unlikely that the total error will be more 
than ±6. It is very unlikely that the error will be more 
than ±10. In contrast, the maximum possible error is 
+ 50. Statistical considerations are particularly important 
in connection with automatic computers, where calculations 
may involve an extremely large number of operations. 

From an analogy with noise in an electrical circuit, 
round-off errors which occur at random can be regarded as 
" noise " superimposed on an exact calculation. Some 
interesting comments on round-off noise are given in R. W. 
Hamming, Numerical Methods for Scientists and Engineers, 
McGraw-Hill (1962), Chapter II. It is also pointed out in 
this chapter that in many circumstances the size of the 
leading (i.e. most significant) digit in the numbers of a 
calculation may tend to be small. Thus if two numbers 
are multiplied together, each being chosen at random from 
1, 2, ... 9, then by examining the leading digits in the 81 
possible products it is easily seen that it is probable that 
the leading digit will be a number with a small leading digit. 

A theoretical analysis of error, statistical or otherwise, 
is invaluable when comparing the relative efficiencies of 
various methods for solving a given type of problem. Even 
when it is possible to estimate errors within useful limits, 
however, we are usually interested in the actual error in 
the answer to a specific concrete example rather than the 
average or maximum error when the method is applied to a 
class of examples. For this reason we shall not devote 
much space in this book to the determination of maximum 
or average errors in computing procedures. Instead the 
following approach is adopted. After a general analysis 
of the sources of error in a given method, the numerical 
procedure is arranged so that an estimate of the accuracy 
of the answer can be obtained empirically from the results 
obtained during the course of the calculation. Some of 
the ways in which this can be done will be explained later in 




connection with particular examples, but the general 
principle is stated here to emphasise that from the point of 
view adopted in this book the object of theoretical error 
analysis is practical error control. 

§ 1.5. The evaluation of formulae on desk machines. It 

will be assumed in this section that the reader is familiar 
with the operation of an ordinary desk calculator possessing 
a setting register, a multiplier register, and an accumulator. 
For example, in forming the product ab the number a is 
inserted in the setting register. When this number is 
multiplied by b the number b appears in the multiplier 
register and the product ab in the accumulator. 

Without going into detail we emphasise briefly several 
aspects of the use of such machines, with particular reference 
to the way in which good machine technique can prevent 
the occurrence of mistakes. An elementary example occurs 
in connection with reading negative numbers off the machine. 
Negative numbers appear in the accumulator as comple- 
ments, for example 99 . . . 994 732 represents the number 
- 5268. In reading such a number off the machine, the 
safe procedure is to set the complement, calculated mentally, 
(namely 5268), in the setting register; add into the 
accumulator; check that the accumulator then reads zero, 
and record the number in the setting register. Another 
example occurs in connection with transferring numbers 
from the accumulator to the setting register. The number 
which is entered in the setting register should be checked in 
an obvious way by subtracting it from the contents of the 

On the whole it is a good idea to do as much work as 
possible on the machine. For instance unnecessary 
recording of numbers should be avoided. Thus axbxc 
and axb+c can be calculated without recording inter- 
mediate numbers. Similarly expressions like 

a 1 b l +a 2 b 2 +a 3 b i + ... 




can be calculated without intermediate recording. In this 
type of calculation (as in any calculation involving a 
repetitive procedure) it is essential to fix the positions of the 
decimal points in the setting register, the multiplier register, 
and the accumulator, subject of course to the condition that 
10x10 = 10. 

Care has to be exercised when transferring numbers 
from the work sheet to a calculating machine or vice-versa, 
or from one part of the sheet to another. Common errors 
occur from transposing digits (e.g. 8452 may be transferred 
as 8542), or from repealed digits (e.g. 8455 may be trans- 
ferred as 8445). 

Wherever possible the work should be reduced to a 
routine. Tabulation is a great help in this respect. It is 
usually best to arrange a tabulation so as to do only one 
thing at a time. Differencing procedures are often useful 
in checking (§ 7.3). 

Ex. 1.3. Find f (I -518) by linear interpolation, given that 
/(I -300) = 0-752 15, /(I -600) = 0-746 39. 
Also find the value of x for which f(x) = 0-750 61. 

If we wish to evaluate y = f(x) by linear interpolation 
given y { = f(x t ) and y 2 =/(v 2 ), we mean that we wish to 
evaluate y assuming that the point (.v, y) lies on the straight 
line joining (.v,, >•,) and (x 2 , y 2 ). It is easily verified that the 
required formula is 

y = {(.v, -.v)>' + (A-.v )j' I }/(.v 1 -.v ). (1.3) 

For computational purposes it is desirable to rearrange this 
formula as 

}' = }'o+P()'i - >o). P = (v--v )/(.v, -.v ). (1.4) 

(This has been explained in Ex. 1.2. Compare (1.1), (1.3) 
and (1.2), (1.4) with b = (a,-*), a = (.y -a), D = y„ 
A = y ). 




In this example h = 0-21 8/0-300 = 0-727, to three 
figures. A safe procedure on a calculating machine is as 
follows. Set 0-752 15. Multiply by 1000, where we use 
three decimal places since // is specified to three places. 
Clear the multiplier register. Calculate y l —y mentally 
(-0005 76). Set 0005 76 and turn the handle once 
backward (since }\-y is negative). Check that the 
accumulator now reads /(1. 6) = 0-746 39. This checks the 
setting of /(I -3) and the mental calculation of y t — y , in- 
cluding its sign. The multiplier register now reads 1 000. 
Turn the handle until it shows 0-727. The number in the 
accumulator is /(l -518) = 0-747 96, rounded to five 

To answer the last part of the question, which involves 
" inverse linear interpolation ", turn the handle till the 
accumulator shows the number nearest to 0-750 61. The 
number in the multiplier register is then 0-267. The required 
value of a- is therefore 1 -300 + 0-3 x 0-267 = 1-380, to three 
decimal places. 

Ex. 1.4. Evaluate the following polynomials to four decimal 
places for x = 0(0-2)1-0: 

/,(*)== a-J.y 3 +.;.y 5 , 

f 2 (x) = 0-995 36v- 0-288 69.v 3 + 0-079 34.* 5 , 

/j(.v) = 0-994 95.V- 0-287 06.v 3 +0078 04.v 5 . 

Compare the results with the corresponding values of 
f(x) = arc tan .v. 

The notation x = a(p)b means " from the value x = a 
to the value x = b, inclusive, at intervals of p in a- ". Since 
four decimal places are required in the answer, five decimals 
should be carried in the working. We write 

/.(*) = ix(3-x 2 +0-6.x 4 ). 

The values of/,(.v) can be found systematically by means 







of a tabulation with the following headings: 



The value of $x is calculated mentally. The values of x 2 
and .v 4 can be obtained from Barlow's Tables j The value 
°Ui(x) is obtained in one desk machine operation involving 
two additions and two multiplications. Three numbers 
are recorded in addition to x and/,(.v). It is convenient 
to set .v 2 , x* on the machine and to use 0-6 and J.v as 

For /,(.v) we can proceed similarly except that it is 
slightly more convenient to write 

f 2 (x) = a(0-995 36-0-288 69.v 2 + 0-079 34a 4 ) 

and the table heading g# is now omitted. The evaluation of 
f 2 {x) is carried out in one machine operation involving 
three multiplications and one addition. Two recordings arc 
required besides x and/ 2 (.v). It is convenient in this case to 
set the five decimal place coefficients in the setting register, 
and multiply by .v 2 , x*. 

If we had to evaluate /,(*) for .v = 0(0-171)0-5*, say, 
then .v 2 , .\- 4 would have to be computed on the desk machine 
so that the procedure in the last paragraph would require 
five multiplications and one addition. In this case we 
should probably prefer to use nested multiplication: 

a lX + a 3 x 3 + a s x 5 = x^+ufai + fls")}. 

where u = a 2 . This requires four multiplications and two 
additions. Only one recording is required besides a- and 
/■>(*)• In hand computing, the number of decimal places 
and significant figures needed at various stages of the 
calculation have a decisive effect on the organisation of the 
calculation. In an automatic computer all numbers are 
treated alike irrespective of the number of significant figures 

t Edited by L. J. Comrie, E. & F. Spon (London) (1952). 


they contain, and the method used for evaluating a poly- 
nomial will not depend on whether the coefficients or the 
values of x arc simple. 

The polynomial/, (a - ) consists of the first three terms of 
the Taylor scries expansion of arc tan x. The accuracy of a 
truncated Taylor series decreases with increase in the dis- 
tance from the point about which the function is expanded. 
This is illustrated by the numerical results in Table 1.1. The 
polynomial / a (x) gives a much better approximation over 
the whole range of a- although it is not so accurate as/,(.v) 

Table 1.1 

Results for Exercise 1.4 


0-2 0-4 0-6 




01974 0-3805 0-5404 




-2 -32 




+ 6 -6 

+ 2 



+7 +1 -6 

+ 1 


near x = 0. The polynomial / 3 (.v) is derived in Ex. 7.5 by 
truncating a Chebyshev series. It is almost as good a 
representation of arc tan x over the whole range as the 
polynomial f 2 {x) which is the optimum polynomial in the 
minimax sense explained in § 7.5. (See also Ex. 3.6 and 
the reference given there.) Polynomial representations like 
fi(x),fi(x) are particularly useful in automatic computing. 

§ 1.6. Mistakes. The main part of this section is 
concerned with mistakes due to faulty computing technique 
when using a desk calculating machine. No matter how 
carefully a computation is performed there is always some 
risk that mistakes will occur and it is essential to include 
checks throughout a calculation to ensure that mistakes 
will be detected as quickly as possible. 




Experience is the best teacher in showing the beginner 
where mistakes are likely to occur and how mistakes can 
be avoided. It is sufficient to mention some of the more 
obvious sources of error— illegible ligures, untidy arrange- 
ment (particularly when figures are arranged in columns), 
unsystematic work, trying to do too many steps at once. 

To some extent computing technique will depend on the 
individual and the following notes represent personal 
preferences. It is convenient to work in ink on paper ruled 
in rectangular boxes, or on squared paper with numbers 
inserted in suitable boxes formed from the squares. A 
mistake is corrected by drawing a line through a wrong 
number and inserting the corrected value above the wrong 
one. This is to be preferred to the use of pencil and rubber 
for various reasons. Use of ink encourages accuracy. It 
is useful to know when mistakes have been corrected, and 
exactly what corrections have been made, partly because it 
is easy to make further errors when correcting mistakes. 
Apart from anything else the work-sheet will soon look 
unsightly if too many mistakes are made and this is a sign 
that the computer should take more care— or even start 
again from the beginning. 

A point that is often neglected is to include sufficient 
explanation on the sheets— a calculation should be intel- 
ligible a year later. It is advisable to include details of all 
the steps in the calculation in sequence— scribbling in odd 
corners of the sheet to obtain intermediate results is to be 
discouraged. It is precisely in these intermediate results 
that mistakes are likely to occur. 

When an answer has been obtained it may be easy to 
check whether it is correct, for example by substitution. 
On the other hand the only method of checking may be to 
repeat the calculation. (Straightforward repetition by the 
same person is not satisfactory since there may be a tendency 
to repeat certain kinds of mistake if a calculation is repeated 
in the same form by the same person.) Between these two 




extremes it is not unreasonable to ask that ten or twenty 
per cent of the time spent on a calculation should be 
devoted to checks. It is often convenient to arrange a 
calculation so that a difference check can be employed 
(ij 7.3). One has to be doubly careful when considering the 
correctness of results obtained by means of automatic 
computers since usually an automatic machine produces 
answers without any detailed information about inter- 
mediate steps. It is therefore very easy to be unaware of 
loss of accuracy due to the particular way in which the 
calculations have been performed inside the machine. 

It is important to try to develop a kind of " numerical 
common-sense " to detect obvious blunders and to keep a 
mental check whether the results of a calculation arc turning 
out as one would expect. 

The length of time taken over a computation is of little 
importance compared with the necessity for obtainina a 
correct answer. 

Examples I 

Ex. 1.5. Perform each of the following calculations on 
a desk machine without recording intermediate results. 
State the accuracy of each answer and round off to the 
appropriate number of significant digits: 

40963- 53-41 -492-0+ 63-41 5, 


2-4076 x 0049 1 8 x 0000 963, 

(0-4362 x 0029 432) -=-0-007 44. 

Ex. 1.6. Find 739-^/38 to five significant figures, 
given v /39 = 6-2450, ^38 = 6-1644. 

Ex. 1.7. A change of 8x in x produces a change of 
approximately (df/dx)8x in f(x). Hence show that if in 




the determination of cot 0128 the argument is correct to 
the number of figures stated, the cotangent is indeterminate 
to within 003. Check this result by finding from tables 
the values of cot 0-128 and cot 0-1285. Similarly find the 
degree of determinacy in the following quantities: 

sin 1-560, cos 1-560, tan -1 150, log 1-083, cxp (-2031). 

Find also the number of significant figures in each quantity 
and check your results from tables. (The point of this 
example is to remind the reader that when y = f(x) is read 
from a table the number of significant figures in y is not 
necessarily the same as the number of significant figures 
in .v.) 

Ex. 1.8. Evaluate (cosh 0-cos 0)/(sinh 0-sin 0) to 
five significant figures for - 0-3741 by expanding 
appropriate terms as power series in 0. 

Ex. 1.9. Distinguish between fixed- and floating-point 
calculations and compare the maximum errors involved in 
the two systems when computing (i) the sum of n numbers 
(ii) the product of// numbers (iii) the sum of// products of 
the form a ( b,. 



§2.1. Introduction. In this chapter we introduce an 
important class of numerical procedures, namely iterative 
methods. These are illustrated by describing typical 
iterative procedures for solving transcendental and algebraic 
equations. (Equations like cot x = 2.v, x In x = 4, con- 
taining transcendental functions, arc called transcendental 
equations. A simple algebraic equation is .v 3 = 3.V+2. 
All of these equations can be written in the form/(.v) = 0.) 

The first step in an iterative solution of an equation 
/(.v) = is to locate the roots approximately. An easy 
method for locating any real roots is to draw a graph of the 
function y = f(.\) and find the points where the curve cuts 
the .Y-axis. Sometimes it is convenient to write f(x) = 
in the form/,(.v) = f 2 {x) and to plot two functions;- = /,(.v) 
and y = /,(.v). The real roots of/(.v) = are given by the 
abscissae of the points of intersection of the two graphs. 

Various methods are available for the numerical solution 
of equations and it lies outside the scope of this short book 
to describe and compare even the more important of these. 
It is not implied in this chapter that the iterative methods 
which we discuss are to be preferred to any other methods 
for solving equations numerically. If an equation is 
complicated and results are required to only a few decimal 
places there may be little point in using, for instance, a 
rapidly convergent iterative method like the Newton- 
Raphson process of § 2.3. It may be much more convenient 








and satisfactory to use repeated plotting on successively 
enlarged scales (compare the last paragraph) or the method 
of false position (Ex. 2.12). On the other hand, when 
finding the complex roots of high-order polynomials it may 
be desirable to use sophisticated methods such as those of 
Graeffe or Bernoulli. 

§ 2.2. A simple iterative method. Consider the equation 

a- 2 -5.y+2 = (2.1) 

which has the roots a, ji, say, where 

a = 2-5-74-25 = 0-438 447, 

ft = 2-5 + 74-25 = 4-561 553. 

In order to illustrate a simple iterative procedure we 
shall solve (2.1) in the following way. The first step is 
to localise the roots approximately. By evaluating 
y = .v 2 -5a-+2 for x = 0, 1, 2, ... we find that (2.1) has 
one root between and 1, and one root between 4 and 5. 
We then rearrange (2.1) in the form 

.v = 0-2.v 2 +0-4. 

This equation can be solved iteratively by writing 

*,+, = 0-2.y 2 +0-4, r = 0, 1, 2, ... (2.2) 

where x r denotes the rth approximation to a root. We 
start by choosing, say, a = as an approximation to one 
of the roots of the quadratic. If this is substituted on the 
right of (2.2) we obtain .y, = 0-4. This value is then 
substituted on the right of (2.2) to give .y 2 , and so on. In 
this way we find 

a- 2 = 0-432, .Y 3 = 0-437 32, .y 4 = 0-438 25, (2.3) 

where in the last two calculations wc have worked to five 
decimal places. It is clear that, as r increases, x, is 

approaching the smaller root a of the quadratic. If on the 
other hand we start with a- = 10 we find 

x t = 20-4, x 2 = 83-6, x 3 = 1398-2, 

and as r increases x r increases without limit. Before 
considering the reasons for these results we give some 

A calculation of the form 

-Y r+1 = F(x,), r = 0,1,2,..., 


where we start from an assumed value for a , and then 
calculate x b x 2 , ... in succession, as indicated in the 
example above, is called an iterative procedure. If as r 
tends to infinity x r tends to a definite value then the pro- 
cedure is said to converge. If as r tends to infinity x r does 
not converge (for example if x r tends to infinity) then the 
procedure is said to diverge. 

We examine theoretically the convergence of the iterative 
procedure (2.4) by a method which will be useful in later 
sections. On setting x = x t +&x„ where a- is an exact root, 
equation (2-4) becomes 

a— <5A- r+1 = F(x-8x r ). 
From (2.4), if the procedure converges, 

X m f(x), 



On expanding the right-hand side of (2.5a) in a Taylor 
series and using (2.5b) we find 

5x, + l = 5x r F'(x)-l-5x?F"(x) + .... 


In this section we assume that F'(x) is non-zero. When 
the difference between x and x, is small we can ignore the 
higher-order terms on the right of (2.6a) which then gives 

dXf+t ss 5x r F'(x). 





Repeated application of this formula gives 

Hence if | F'(x) | < 1 we obtain the result that Sx r tends to 
zero as r tends to infinity i.e. the iterative procedure 

A more rigorous derivation is the following. Instead 
of the infinite Taylor series which led to (2.6a) we use the 
Taylor series with remainder (or equally the mean value 
theorem for derivatives)!: 

F(x-6x r ) = F(x)-dx r FXZ), 

where f lies in the range x to .v r = .v-<5.\- r . It is assumed 
that F(£) exists in this range and that F(£) is continuous in 
the range including the end-points. Then (2.5a, b) give, 
instead of (2.6), 

<5.v r+1 = 5x,F'(0 

where I; lies between x and .v r . Suppose that 

I F'(0 | < M (2.7) 

for the range of g in which we are interested, where M is a 
constant independent of r. Then 

|5.v r+1 |<M|«5x r |<M p+1 |^o|. (2.8) 

If M< 1 then Sx, tends to zero as r tends to infinity. A 
sufficient condition for convergence of the iterative pro- 
cedure (2.4) is therefore that | /•"'(£) | < 1 for { near the 
required root. 

It is instructive to illustrate this result graphically as in 
Fig. 2.1(a-d) where we have drawn y = x and y = F{x) for 
several different functions F{x). Starting from an arbitrary 
value x of x on the line ;' = x we move vertically to the 
curve y = F(x) and then horizontally back to the line 
y = x. From (2.4) this gives a,. Repetition of the pro- 
cedure gives .Y 2) .r 3 , ... in succession. It is easy to see whether 

t J. M. Hyslop, Real Variable, Oliver and Boyd (I960), § 6.2. 




the procedure converges or not. From the diagrams in 
Fig. 2.1 it is clear that convergence to the required root can 
occur only in cases (a) and (b), namely when | F'(x) \ < 1 
and the first approximation is sufficiently near the required 

Even if there are mistakes in the intermediate steps the 
above iterative procedure will give the correct final answer 
provided that the iteration ultimately converges to the 
required root (cf. Ex. 2.14). On the other hand mistakes 
may delay convergence or cause convergence to another 
root. One simple check that should be applied mentally 
in hand computation is that, as the calculation proceeds, 
the x r tend to a limiting value monotonically as in the case 
of Fig. 2.1(a), or they lie alternately on either side of a 
limiting value as in the case of Fig. 2.1(b). 

We now return to consideration of the iterative pro- 
cedure (2.2) for which F(x) = 0-2a- 2 +0-4, F'(x) = 0-4.v. 
There are two roots a«0-44 and /J%4-6 so that | F'(a) \ < 1 
and | F'O?) | >1. Hence the above theory tells us that if 
we try iterating with an initial approximation near a the 
procedure will converge to this root but if we start with an 
initial approximation near/? the procedure will not converge 
to p. The iteration can be examined graphically as in 
Fig. 2.2(a). Proceeding as for Fig. 2.1. it can be shown 
that if we choose an initial approximation .v such that 
\ x o\<P then the iterative procedure will converge to the 
smaller root a, but if | .v | >fi the procedure will diverge. 

There are of course a large number of ways in which 
we can rearrange an equation/ (a) = in the form x = F(x) 
and each rearrangement will give an iterative procedure of 
type (2.4) with, in general, different convergence properties. 
Thus if we rearrange (2.1) in the form .v = 5-(2/x) we 
have the iterative formula 

- v r+i — 5 . 




In this case F'(x) = 2/.v 2 so that | F'(x) | > 1, | F'(j?) | < I, 
and the procedure will converge to the larger root /? if we 
start with an initial approximation near the larger root but 
it will not converge to the smaller root a if we start with an 
approximation near to a. The situation is illustrated in 

(a) 0^F%x)<l 

y-- n.o , 

Fio. 2.1 

(c) F%\)>\ (d)F(.v)<-l 

Diagrams illustrating the iterative procedure .v r| , - F(.v r ) 

Fig. 2.2(b) from which it is clear that the procedure ulti- 
mately converges to the larger root whatever first approxi- 
mation x is chosen. 

Still another example is illustrated in Fig. 2.2(c) which 
shows the iterative solution of the trivial equation x = -x 3 . 
The procedure converges to the root x = if we choose 
I a- | < 1 , but diverges if we choose | .v | > I . 

3 2.2 



The discussion given above has illustrated the following 
points in connection with iterative procedures: 

(i) If we wish to obtain a specific root of an equation 
it may be necessary to try various iterative 
methods obtained by rearranging the equation 
f(x) = in the form x = F(x) in different ways 
until we obtain a sequence which converges to the 
required root. 

(a) xr+i - 0-2*»0-4 (b) .v,+ i - 5-(2/.v,) (c) »+l - -x} 
Fio. 2.2 Examples of the iterative procedure x r + 1 = F(x r ) 

(ii) It is important from a practical point of view that 
convergence should be as rapid as possible. For 
rapid convergence of the procedure (2.4) discussed 
in this section it is clear from (2.8) that | F'(v) | 
should be as small as possible, and .v should be 
chosen as near to the required root as possible. 

(iii) In considering whether an iteration is converging or 
not it may be necessary to ignore the first few 
iterations since the procedure may appear to 
diverge initially, even though it ultimately 

(iv) In the iterative procedure considered in this section 
the correct final answer is obtained even though 




there are mistakes in the intermediate steps, 
provided that the iteration ultimately converges to 
the desired root. 

The rate of convergence of iterative methods of the type 
discussed in this section tends to be rather slow, and in 
practice a device is used to accelerate the convergence, as 
discussed in § 2.4 below. 

§ 2.3. The Newton-Raphson iterative method. Let X be 
an approximate root of f{x) = and suppose that the 
exact root is x = X+f,, so that h is a small quantity. Then 
f{X+h) = and, expanding by Taylor's theorem, we have 

AX + h) = = AX) + hf'(X) + \h 2 f"{X) + .... (2.9) 

If /; is small and f(x) is a well behaved function in the 
neighbourhood of the required root, the terms in I, 2 , ft 3 , ... 
can be neglected, so that 

The Newton-Raphson iterative formula is obtained by 
writing this result as an iterative procedure: 

*""*-$$ <2J0) 

where, as in § 2.2, we start with an estimate .v and calculate 
in succession .v,, x 2 , .... 

The geometrical interpretation of the Newton-Raphson 
method can be seen from Fig. 2.3(a). Suppose that the 
curve /(*) = cuts the .v-axis at A, so that OA = x. If 
-v, = OC, then CD «/£g and tan CBD =/'(*,)• Also 
BC = CD cot CBD. Hence 

OB . OC-BC = .W(.v r )//'(.v,). 

so that OB = .v r+I , from (2.10). The diagram indicates 
that once the procedure starts to converge towards a root, 




it will converge from one side. Under exceptional 
circumstances this may not be true, for instance if/" = 0, 
(/"//') <0 at the root. Also the first few iterates may 
oscillate on either side of a root, as in Fig. 2.3(b). 

Difficulties may arise if/'(v) = near a root of/(.v) = 
which it is required to determine, particularly when the 
equation has two nearly equal roots in this region. These 
difficulties can often be clarified by plotting the curve 

(a) (b) 

Fio. 2.3 Convergence of the Newton-Raphson iterative procedure 

y = /(•*) for values of x near the required root and ex- 
amining the geometrical interpretation of the Newton- 
Raphson procedure. Thus in Fig. 2.3(b) if we start from 
the abscissa P we certainly do not converge to either of the 
roots at E, F though of course the procedure converges to 
another root of the equation. It is usually sufficient to 
ensure that the .v r approach the required root of the equation 
from the side opposite to that on which f'(x) = 0. If 
/'(.v) is small near the required zero of/(.\) = then it may 
be necessary to compute /(.v r ) and/'(-v r ) to a high degree of 
accuracy in order to obtain an accurate estimate of *,+ ,. 
This indicates that the equation is " ill-conditioned ". The 
meanings of the words " ill-conditioned " and " accurate " 
are discussed in connection with polynomials in § 2.6. 




8 2.3 



so that 

To examine the rate of convergence of the Newton- 
Raphson method we use the results in § 2.2, (2.4)-(2.6). To 
apply these to the present case we must set 

/"'(.v)=/(.v)/"(.x)/{/'(x)} 2 = 0. 
F*(x) =f"(x)lf'(x)*0, in general, 

where in both equations we have used the fact that/(v) = 
Hence from (2.6a), since 6x, - x-x r , 

x -Xr + i = -ip(x-x r ) 2 + ..., where />= /"//". (2.11) 

If x r is nearly equal to the required root x, the higher order 
terms can be neglected and the error in A r+1 is proportional 
to the square of the error in x r . This means that the rate of 
convergence of the Newton-Raphson procedure is very much 
more rapid than the rate of convergence of the simple 
iterative method discussed in § 2.2. This is illustrated in 
the following examples where the final estimate of x is 
obtained by repeating the iteration until .v r+1 and x r agree. 
Care may have to be exercised when using this criterion 
when working to a fixed number of decimal places, as 
discussed in § 2.6 below. Some general aspects of rates of 
convergence, including estimation of the rate of convergence, 
are discussed in the next section. 

Ex. 2.1. Find the roots of .v 2 -5.v+2 = to five decimal 
places by the Newton-Raphson procedure. 

If we choose/ (a) = .v 2 -5.v+2 then (2.10) gives 


r+l — 

_ *?-2 

2x r -5 

On taking a = we find .v, = 0-4 and .v 2 = 0-438 095 
* 3 = 0-438 447, x 4 - 0-438 447. The extremely rapid 

convergence is evident. (Compare results (2.3) from the 
procedure in § 2.2.) On starting from .y = 40 we find 
X, = 14/3 and x 2 = 4-564 104, x 3 = 4-561 554, * 4 = x s 
= 4-561 553. In this example it is convenient to work to 
six decimal places throughout. In calculating x 2 it would 
have been sufficient to have worked to three decimal places, 
but we did not know this beforehand. In more complicated 
examples it may be convenient to work to a small number of 
decimal places in the earlier iterations until the procedure 
starts to converge rapidly. 

When using the Newton-Raphson procedure it may not 
be necessary to recompute /'(.v) at each iteration. It may 
be sufficient to use an approximate value of/'(v)- Thus in 
the above example we can write 

x r+\ — x r — 

x f 2 -5x f +2 
(2x r -5)* ' 

where the star denotes an approximate value. On choosing 
x = we find a-, = 0-4. To obtain a simple formula we 
set (2v,-5)* = -4 and use the iteration 

.v r+1 - x,+l(x;-Sx, + 2) = i(A r 2 -A r + 2). 

With .v, = 0-4 this gives 

X a = 0-44, .v 3 = 0-4384, .v 4 = 0-483 449. 

These figures should be compared with those obtained 
previously. Although the rate of convergence is of course 
not so rapid as in the true second-order procedure, it is 
still remarkably good considering the crudeness of the 
approximation used for 2a, - 5. Although we have approxi- 
mated tof'(x) it is essential to calculate/ (a) to full accuracy. 
It often happens in iterative processes that we can be 
relatively crude in certain parts of the calculation, but 
accuracy must be maintained in certain vital parts of the 




Ex. 2.2. Find v /ll to five decimal places by an iterative 

We wish to solve the equation .v 2 = a. On writing 
/(.v) = x 2 — a the Newton- Raphson formula gives 

.v r+1 = x r -±(x r - -J = ±(x r + - 

where the first form is convenient for hand computation 
since x,-(a/x r ) is a small quantity. If we choose .v = 3 

.v, = 3-33, x 2 = 3-316 65, x 3 = 3-316 62, 

where x 3 is correct to five decimals. This is a very con- 
venient method for finding square roots on a desk calculator. 

Ex. 2.3. Solve x = exp (l/.v) to five decimal places. 

Various formulae can be obtained for the iteration by 
rearranging the equation in different ways. If we choose 
/(.v) = x-exp (1/x) the Newton-Raphson formula gives 

x, +1 = .v,-x^{x r -exp(I/.v r )}/{x 2 +exp(l/x r )}. (i) 
If we write x = l/y so that/OO = 1 ->• exp y, we find 

JV+ 1 = >V+{exp {-y r )-y,W +y r ). (ii) 

If we take logarithms and write /(.v) = 1-x In x, then 

*r+ 1 = v r +(l -x. In .v,)/(l +ln .v f ). (iii) 

Forms (ii) and (iii) are more convenient than (i). 

It is found graphically that there is only one root near 
x = 1-8 (or>> = 0-6). With initial approximation y = 0-6 
formula (ii) gives successively 0-568, 0-567 144 i.e. 
x = 1-763 22. With initial approximation .v = 1-8 formula 
(iii) gives successively 1-7634, I -763 22. The final iteration 
should of course be repeated as a check. 

5 2-4 



§ 2.4. General aspects of iterative procedures. If an 
iterative procedure for finding a quantity x by calculating 
successively x,, x 2 , ... is such that 

x-x r+1 = a k (x-x,) k + a k+l (x-x,) k+, + ... (2.12) 

where a k is a non-zero constant, we say that the iterative 
process is of the fcth-order. We see from (2.6b) that the 
simple procedure in § 2.2 gives in general a first-order 
process, and from (2.11) that the Newton-Raphson 
procedure is second order. 

The order of the iterative procedure x r+ , = F (x r ) for 
finding a root x can be obtained by expanding F(x r ) in a 
Taylor series around x, as in (2.6a). If the first non-zero 
derivative of F(x) is of the Arth order then the iterative 
procedure is also of the kth order, since (2. 1 2) and (2.6a) 
then agree, with k\a k = (-l)* +1 F ( *>(x). If F(x) has a 
simple form it may be easy to determine the order of an 
iterative procedure by finding the order of the first non- 
vanishing derivative, as in the following example. If F (x) is 
complicated it may be simpler to obtain the expansion 
(2.12) directly, as in the argument leading to (2.17) below. 

Ex. 2.4. Show that x r+ , = x r (3-3ax r +a 2 x r 2 ) is a third- 
order process for the calculation of I /a. 
We have 

F{x) = 3x-3ax 2 + fl 2 x 3 , F'(x) = 3(flx-l) 2 , 

F"(x) = 6a(ax- 1), F'\x) = 6a 2 . 

The equation x = F(x) has three roots, namely x = 0, 
1/a, 2/a. If x'= the process is first-order with F'(x) = 3 > 1, 
and therefore the iteration will never converge to x = 0. 
Similarly if x = 2/a the process is first-order with 
F'(x) = 3 > 1 and the iteration will not converge to x = 2/a. 
The remaining possibility is x = 1/a and then F'(x) = F"(x) 
= 0, F"(x) yt 0, so that we have a third-order process for 
the calculation of 1/a. 







We shall use the following notation for the approximate 
relations between the errors in x, and .v r + , for first- and 
second-order procedures, respectively: 

.v-* r+ , = M(x-x r ), 
x-x r+l = N(x-x r ) 2 . 


A second-order process is extremely rapidly convergent 
once the error becomes small enough. If the error at any 
stage of an iteration is S, the error in successive iterates in 
a second-order process is 

NS 2 , N 3 8\ N 7 S 8 , ... N~\NS) 2 \ .... 

Hence the procedure is convergent if | NS | <1, and the 
smaller | NS | is, the more rapid the convergence. The 
term | NS | 2 " ultimately goes to zero extremely rapidly as 
r increases, for any value of | NS | less than unity. If 
NS = 01, then the sequence (NS) 2 , (NS) 4 , (NS) 8 , ... gives 
001, 0-0001, 000000001, .... If NS = 0-9 we have, 
approximately, the sequence 0-81, 0-656, 0-430, 0185, 

0054, 0-0029, 0000 008 4 The convergence is initially 

slow but ultimately rapid. Once the state of rapid con- 
vergence is reached, each iteration will roughly double the 
number of correct figures in the answer, but it is misleading 
to state without qualification that second-order processes 
double the number of significant figures per iteration. For 
instance the iteration may be started with a value of x so 
far from the required root that the second-order analysis 
may not apply to the first few iterations. In this case the 
initial convergence may be very slow. In fact one of the 
problems in applying second-order processes is to arrange 
that the state of rapid convergence is reached after as few 
iterations as possible (compare Ex. 3.1, Program 3.5). 

In first-order processes the behaviour of the error is 
completely different from that just discussed in connection 
with second-order processes. If the initial error is 5, the 

errors in a first-order process are successively 

MS, M 2 S, M 3 S, .... M'S, ... . 

The error is reduced by a constant factor M in each iteration. 
This means that there is no ultimate extremely rapid con- 
vergence as in second-order processes. 

Because of the comparatively slow convergence of first- 
order processes it is often desirable to improve the rate of 
convergence by the following important method. From 
(2.13) we assume that, for the first-order procedure under 

X-x r =M(X-x r . l ), 
X-x,^ = M(X-x r . 2 ), 

where X is an approximation to the root. On eliminating 
M and rearranging we obtain an expression for X in terms 
of x r , .Y r _,, .v r _ 2 which will be, in general, a better approxi- 
mation to the root than any of the iterates. We find 

X = X r X r-2~ x r- i 

X r — 2.X,_i+X r _ 2 

= X, 

(*,-*,-, ) 2 


-x r -2.x r _,+x r _ 2 ' 

where the second form is more convenient for calculation 
(compare Ex. 1.2). The use of (2.15) is usually called 
Aitken's 5 2 -process since A. C. Aitken popularised its 
applications in numerical analysis. It goes back at least 
as far as Kummcr in 1837. As an example consider the 
figures (2.3): 

x 2 = 0-432, .y 3 = 0-437 32, .v 4 = 0-438 25. (2.16a) 

Substitution in (2.15) gives 

A' = 0-438 25 -(0-000 93) 2 /(- 0-004 39) = 0-438 45. (2.16b) 




If we use this value as a new .v in (2.2) and start a new 
sequence of iterations we find that the first iterate repeats 
the value 0-438 45 so that the iteration converges to this 
estimate of the root. 

(It is convenient at this point to anticipate notation 
introduced in § 7.2 in order to explain the name " 5 2 - 
process ". Suppose that the numbers (2.16a) arc arranged 
in a difference table: 





0432 00 



0-437 32 



0-438 25 


The quantity x r -x r _, (= 93) is the first difference Sx,^ 
and x r -2v,_ 1 +.v,_ 2 (= -439) is the second difference 
(5 2 x r _ ,, in units of the fifth decimal place. These numbers 
occur in (2.16b) above. The name " 6 2 -process " comes 
from this connection with differences.) 

This type of procedure, namely using several successive 
iterates to obtain an improved estimate of the root, which 
is then used as an initial approximation for a further 
sequence of iterations is called accelerating the convergence. 
It must be remembered that in the derivation of formulae 
for accelerating convergence we assume that round-off 
errors are negligible. The formulae will not be valid if 
round-off errors are comparable with the differences between 
successive iterates. Although wc stated at the end of 
§2.2 that mistakes are usually not very important in 
iterative procedures this is obviously no longer true when 
accelerating procedures are used. An accelerating procedure 
may make convergence worse if the assumptions on which 
the procedure are based are not satisfied. Thus Aitken's 
<5 2 -proccss is based on (2.13) which assumes a first-order 




procedure. Consider the following figures from Ex. 2.1 
illustrating the Newton-Raphson second-order procedure: 

.v, = 4-666 667, x 2 = 4-564 104, x, = 4-561 554. 

The c5 2 -process indicates that .v 3 is in error by 0000 065 
whereas the error is in fact only 0000 001. 

To examine the convergence of Aitken's process, 
suppose that a first-order procedure gives three successive 
iterates x , x,, x 2 which are used to obtain a new initial 
value x{,° by means of (2.15). From xj, 1 ' we deduce x ( ,°> 
x 2 " by the first-order procedure and then use (2.15) to 
obtain x{, 2) , and so on. From the definition of a first-order 
procedure we have 

X-X, = fl,(x-x ) + a 2 (x-x ) 2 + ..., 
x-x 2 = a l (x-x l ) + a 2 (x-x 1 ) 2 + ... 

= flj(x-x )+fl 1 a 2 (l + a,)(.t-x ) 2 + ..., 
where a x , a 2 are constants. Hence 

x 2 -2x,+x 

= -(l-a I ) 2 (x-x ) + fl 2 (2-a l -a 2 )(x-x ) 2 + ..., 
(x 2 -x t ) 2 

= fl 2 (l- fll ) 2 (v--Vo) 2 + 2fl 1 a 2 (l-2fl l -fl?)(x-x ) J + .... 
On substituting these results in (2.15) with r = 2 we find 

X-x4" - C(x-xo) 2 , (2.17) 

where C is a constant. Similarly, in general, 

x-x£ +,) = C(x-x< ") 2 , 

so that Aitken's <5 2 -process for accelerating convergence is 

The question of whether to use a first-order process with 







acceleration of convergence, or the Newton- Raphson 
second-order process, depends on the particular equation 
considered. As a broad generalisation, iff'(x) has a simple 
form then the Newton-Raphson procedure will be prefer- 
able but in complicated examples the accelerated first-order 
procedure may be simpler. Second-order processes are so 
quickly convergent that it is seldom necessary to consider 
third- or higher-order processes, although procedures of 
arbitrarily high order can be constructed for any given 
iterative problem. 

When using a first-order iterative procedure it is often 
useful to estimate the quantity M in (2.13), after each 
iteration. By subtracting the equations 

x-x r = M(x-x,-,), .v-.v f _, = M(x-x,- 2 ), 
we find 

M = *'"*'-' = -\ (2.18a) 

X r _|— .X r _ 2 / 

say, where /, the reciprocal of M, has been introduced since 
it is usually more convenient to work in terms of / rather 
than M. By inspection we see that equation (2.15) for the 
<S 2 -process can be written in terms of t as 

*-*+fZJ &-*-!> 


As an example the figures in (2.16a) give 

t = 532/93 » 6, 

X = 0-438 25+}(0000 93) = 0-4384 4 . 

From (2.6) we see that, when solving the equation x = F(x), 
the theoretical value of M is F'(x). In this example the 
theoretical value of / is therefore t = \jM = l/(0-4x)%6. 
However in practice it is useful to compute the empirical 
value of t after each iteration since if the successive 

estimates of t tend to a constant value this confirms that 
^-extrapolation will be valid. 

In a similar way when using a second-order procedure 
it is useful to estimate N in (2.14) after each iteration. 
Since a second-order procedure converges very rapidly we 
estimate N by replacing x by x, in the equation 

This gives 

x-x,-i = N(x-x r - 2 ) 2 . 

Nx(x r -x,- t )l(x,-x,- 2 ) 2 . 


Similarly to estimate x we assume that we can replace x by 
x r on the right of the following equation : 

This gives 

x— x r = N(x— x p _i) 2 . 
.x = x r +M-v,-* r -,) 2 , 


where it will be assumed that the value of N is estimated by 
means of (2.18c). In terms of the quantity t defined in 
(2.18a) this formula becomes 


though now l will not tend to a constant value independent 
of r as r increases. As a numerical example consider the 
following figures from Ex. 2.1: 

X, = 4-666 667, x 2 - 4-564 104, x 3 = 4-561 554. 


N ft) 0-0025/(0- 1) 2 = i, 

x-x 4 « i(00025) 2 = 0-000 001 6, 

which indicates that x t should be correct to within about 
two digits in the sixth decimal place. Alternatively we 



/ = 01/00025 = 40, 
x-x 4 « 0-0025/41 2 = 0-000 001 6, 

as before. The above formulae should not be applied until 
the iterative procedure has started to converge systematically. 

§ 2.5. Real roots of polynomials. The general poly- 
nomial equation of degree n is (assuming a ¥= 0): 




/>„(.*) = a x" + a l x''- , + ... + a„ = 0. 


The following properties of polynomial equations should 
be borne in mindf: 

(i) A polynomial equation of degree n has exactly n 
roots. (A factor (.v— a)" gives a repeated root 
which must be counted in times.) 
(ii) If the equation has real coefficients, complex roots 
occur in pairs. If p+iq is a root then p-iq is a 
root. An equation of odd order has at least one 
real root, 
(iii) If the roots arc x t , x 2 , ...x„ then 

-*,+x 2 + . .. + *„= -(a,/a ), 

x l x 2 ...x„ = (-iy(aja ), 

£x,x 2 ....v, - (-l) r (a r /a ), r = 2 to «-l. 

In this equation the sum is taken over all 
possible products of r distinct .v, chosen from the 
n roots, 
(iv) If one root is much smaller than the others then it 
is given approximately by a n _ 1 x+a„ = 0. 
Similarly the two smallest roots are often given 

t H. W. Turnbull, Theory of Ft/nations, 5th Edition (Oliver and 
Boyd, 1957). 

approximately by the roots of a n ^ 2 x 2 +a n . 1 x + a H 
= 0. If one root is much larger than the other it 
is given approximately by a^x+a^ = 0. The 
two largest roots are often given approximately 
by a x l +a i x+a 2 = 0. 

The Ncwton-Raphson formula can be applied to the 
polynomial equation (2. 19). If u is an approximate root of 
this equation then an improved root is given by 



It is desirable to systematise the calculation of P n {ii)jP' n (u). 
We consider only real u. Suppose that division of P„(x) 
by x-u gives 




P n (x) = (x-u)Q n .,(x)+B. 

P*(u) = B. 

/ , n 'W = Q™-iW+(v-«)0:_ 1 W. 

Suppose that division of Q n -i(x) by x-u gives 

<? M -iW = (x-u)R n -,(A:)+D. 


Pnd') = Q n -i(u) = D. 

From (2.20) the improved value of u is therefore given by 


The calculation of B by long-hand division, which also 


gives the coefficients of &-»(*)« proceeds as follows: 

X-U) Oq 

b -b u 


b } -b t u 



(bo *i b 2 

where b = a , b x = a^b^i, b 2 = a 2 + b,u, etc. The 
evaluation of B is essentially the evaluation of the poly- 
nomial P„(u) by nested multiplication: 

P M = [{(a u + a l )u + a 2 }u + a 3 ']ii + ..., 

mentioned in Ex. 1.4. It is convenient to tabulate the 
calculation in the form given below, where b„ = B = P„(u). 
We include also the calculation required for P' n {u) = D 
given by c„_, in the table 

a o, a 2 a 3 
b Q u b t u b 2 u 

"o bi b 2 b 3 
c u CjU c 2 u 

Cl c 2 c 3 

ft.-, b„ (2.21) 



The sequence is to calculate in succession b , b u, b u b,u, b 2 , 
etc. When the student has gained some practice it will be 
possible to form b, = a r +b r . t u in one machine operation 
without recording Z> r _,M, and similarly for c r , so that the 
second and fourth rows can be omitted in the above table. 
If several roots are required the roots should be deter- 
mined in the order of increasing absolute value, and it is 
usually satisfactory (and desirable) to remove them from 
the polynomial as they are determined. (See Ex. 3.3 where 
this procedure is discussed in connection with automatic 
computing.) The polynomial obtained by dividing P n (.\) 




by x—u is 

P n _ 1 (x) = 6 x"- , + b 1 x'- 2 + . .. + &„_„ 

so that the coefficients of P n _,(x) have been determined in 
the course of the Newton- Raphson procedure for finding 
the roots of P n (x). 

Ex. 2.5. Find the real root of the following equation to four 
decimal places. 

.v 5 + 2-653.v 4 +4-512.v 3 -2043.v 2 -0-263.v-0-251 = 0. 

A graph shows that the real root is near 0-5 and we take 
this as ,v . The procedure leading to the table (2.21), 
without the second and fourth rows, gives, on using five 
decimal places throughout for convenience: 

1 2-653 00 4-51200 -2 043 00 -0-263 00 -0-25100 (0-5 
1 315300 608850 100125 0-237 62 -0132 19 
I 3-653 00 7-915 00 4-958 75 2-717 00 

1 2-653 00 4-512 00 -2043 00 -0-263 00 -0-25100 (0 549 
1 3-202 00 6-269 90 1-399 18 0-505 15 026 33 
1 3-75100 8-329 20 5-97191 3-783 73 

The first correction is <5, = 0132 19/2-71700 = 0049, giving 
.v, = 0-549. The second correction is S 2 = -0026 33/ 
3-783 73= -0-006 96 giving .v 2 = 0-5420 4 . A further 
iteration gives a final value of 0-5419, correct to four 
decimal places. 

§ 2.6. Errors when finding roots of polynomials. When 
we wish to specify the accuracy of an approximation A" to a 
root x of an equation f(x) = there are at least three 
different ways in which the word " accuracy " can be 

(i) Assuming that/(.v) is specified exactly then | X-x \ 
is the absolute accuracy of X and | X-x |/| x | 







is the relative accuracy of X. We shall normally 
use the word " accuracy " in one of these senses. 
(ii) If we do not know x then the smallness of the 
quantity f(X) = R, say, gives some indication of 
the accuracy of X. The quantity R is called the 
residual. It is most important to realise that small 
R docs not necessarily mean that the absolute or 
relative error of X is small. This is discussed 
(iii) If the constants defining the equation f(x) = (for 
example the coefficients, if/ (a) is a polynomial) 
are not known exactly, then the uncertainty in 
these constants means that there is an uncertainty 
in the values of the roots of /(.v) = 0. The 
accuracy of an approximate root can be no greater 
than the uncertainty produced in the roots by 
possible variations in the constants defining the 

In some cases, with reference to (ii), small values of R 
can be produced by values of X appreciably different from 
the exact roots; and with reference to (iii) small variations 
in constants can produce large variations in the roots. 
These phenomena often appear together. In either case 
the equations are said to be ill-conditioned. As an example 
consider the equation 

/ (x) = x 2 - 1 064.V + 0-283 - 0, (2.22) 

which has roots 0-527, 0-537 to three decimal places. The 

x 2 -10641.v+0-283 = 

has roots 0-523, 0-541; changing a coefficient by I in 
10,000 has changed the roots by 1 in 130. If we substitute 
x = 0-532 in (2.22) then/ (.y) = R = -0000 024; changing 
x by 1 in 100 produces a residual which is one ten-thousandth 
of the smallest coefficient. It should be emphasised that 



these phenomena are inherent in the equation; they appear 
whatever method is used to solve the equation. 

Since in the above example a change of 0001 in a root 
produces a change of only 0000 005 in R, we need to work 
to six decimal places in order to verify that a root is accurate 
to three decimals. To examine the reason for this more 
closely, we note that in the Newton-Raphson method the 
correction to the root is -f(x)/f'(x). In the above example, 
although /(.v) = R is small, /'(*) is also small. For ex- 
ample, if we choose x = 0-525, 

f(x) m (0-525) 2 - I -064(0-525) +0-283 = 0000 025, 

/'(a) = 2(0-525)- 1 064 = 0014. 

We encounter the familiar difficulty that in computing f(x) 
and /'(v), the cancellation of large numbers gives small 
numbers, and in order to obtain a moderate degree of 
accuracy for their quotient we need to work to a large 
number of decimal places. This is not a criticism of 
Newton's method. The difficulty will occur in one form or 
another, whatever method is used to solve the equation. 

To illustrate by a numerical example, consider the 
formula obtained by applying the Newton-Raphson formula 

x 2 -0-283 

= 0-55, working to four decimal places, 

Iteration with .v 


v, = 


= 0-5417, x 2 = 


= 0-5361, (2.23) 

and then in succession 0-5366, 0-5326, 0-5833, 0-5575, 
0-5451, 0-5382, 0-5403, 0-5361, .... This last number, 
which is .v, , is in fact identical with x 2 and further iteration 
will merely repeat the values for x 3 to .y 10 . Even though 




we have used four decimal places, all we can say about the 
root is that it is near 0-54. Equation (2.23) shows clearly 
the way in which significant figures arc lost in intermediate 
calculations. Similarly if five decimals arc used in the 
calculation it will be found that the iterates vary between 
0-536 50 and 0-537 50 in general, and we can say that the 
root is approximately 0-537. There is no guarantee that the 
correct answer lies between the upper and lower limits 
within which the iterates fluctuate, or even that if an 
iteration apparently converges that it converges to the 
correct answer. Thus in the above example if we iterate 
with .v = 0-50 using four decimal places, we find in 
succession 0-5156, 0-5244, 0-5263 and all further iterations 
repeat 0-5263, but we know that the correct answer in this 
case is 0-5270. 

To sum up this discussion: in order to calculate any 
root to a specified degree of accuracy an essential and 
unavoidable degree of accuracy is necessary in the inter- 
mediate calculations and this may be much greater than the 
final accuracy of the root. 

The change Sx in a root x produced by a change 8a r in 
a coefficient a r can be examined theoretically as follows. 
The quantity x+5x is a root of the perturbed equation, so 

P n (x+5x)+8a r (x+8xy- r = 0, 

Hence to first order, since P„(x) = 0, 


P&x)8x+5 ar x n - r = 0, 
8x = -8a r {x"-'IPXx)}. 


When {x"~ r IPI,(x)} is large for any root, then by our previous 
definition this root is ill-conditioned. Of course the linear 
approximation on the basis of which (2.24) is derived 
breaks down for ill-conditioned roots. Equation (2.24) is 
not quantitatively accurate for the polynomial (2.22), for 




example, but the qualitative indication of ill-conditioning 
is correct. Formula (2.24) indicates that the smaller roots 
tend to be better conditioned than the larger roots since, 
if X; is the /th root, the power x"~ r usually increases more 
rapidly with i than PJ(x,). 

Some instructive illustrations of the application of (2.24) 
arc given by J. H. Wilkinsonf from whose paper the follow- 
ing examples are quoted. In this paragraph we assume that 
x lt x 2 , ... x„ represent the n roots of a polynomial of" 
degree n. (Elsewhere in this chapter x, denotes the rth 
approximation to a root in an iterative procedure.) If we 
consider the polynomial equation 

^o(-v) = (-v+ 1X-V+2). . .(.v+ 20) = 0, (2.25) 

with roots .v, = -s, (s = 1 to 20), we find for a change 
«5a, in a, : 

8x 20 a 0-4 x 10 8 &7,, (5.x, 5 w 0-2 x 10 ,0 &i„ 8x 5 « 0-6<5a,. 

The roots .*, to x s are well-conditioned, but the conditioning 
is extremely poor for the large roots. If we consider the 
polynomial equation 

(.•c + l)(x+2)...(x+20)+2-"x 19 = 0, 

which differs from (2-25) by about 10" 7 in a u the roots are 
x t = — s to nine decimal places for s = 1 to 4, but .v 10 , x tl 
are — 10-10±0-64/, .v 15 ,.v 16 are —13-99 + 2-52/, and. v I9 ,a.- 2 o 
are — 19-50+1 -94/. Among other things, this example 
illustrates in a remarkable way that some roots can be 
ill-conditioned and others well-conditioned, in the same 

Although it is true that ill-conditioning is usually 

t The evaluation of the zeros of ill-conditioned polynomials, 
Numcrische Mathematik, 1 (1959), 150-180. This excellent paper 
will repay careful study. 



8 2.7 

associated with the occurrence of nearly equal roots, which 
in turn implies that f'(x) is small, this remark should be 
interpreted with care. Thus the equation 

.x 20 + l =0, 

which has 20 roots, all of modulus unity, is comparatively 
well-conditioned. This can be confirmed by using (2.24). 

§ 2.7. Bairstow's method for finding complex roots of 
polynomials. In this section we extend the Newton-Raphson 
method for real roots (§ 2.5) to give formulae convenient 
for dealing with complex roots. Suppose that w is an 
approximate complex root of P n (x) = 0, where the poly- 
nomial has real coefficients. Consider the quadratic 
factor (x— w)(x— w) = x 2 — px—q, say, where w is the 
complex conjugate of w, so that p, q are real. We write 
(cf. (2.19), (2.20)) 

P„(x) = (x 2 -px-q)S n . 2 (x) + Ax+B, (2.26) 

S„_ 2 (.x) = (x 2 - P x-q)T n _ A (x) + Cx + D. (2.27) 

Hence P„(w) = Aw+B. Also 

P#e) = _2x- P )S,,. 2 (x) + (x 2 -px-q)S' n . 2 (x) + A. 

Since p = w+w this gives Pfov) = (w-w)(Cw+D) + A. 
From the Newton-Raphson formula the correction Sw to 
the root w is 

Sw = — 

Aw + B 


In general Sw is small, so that the numerator of this expres- 
sion is much less than the denominator and this implies 
that the term A in the denominator can be neglected. 


5 2.7 



Hence we can write 

Sw = — 




When we examine the division of P n (x) by a quadratic 
factor x 2 —px—q in detail, it is found convenient to modify 
the above formulae in the following way. Suppose that in 

P n (x) = a x"+a l x''- , + ...+a„, 

S„_ 2 (.v) = b x"- 2 + b l x"- 3 + ... + b n . 2 . 

Then from (2.26), b = a a and 

b i = "j+P&o. 

b r = «,+/>!>,_, + gb r _ 2 , (2£r£«-a (2-29) 

A = a.-j+l»^,-a+«&»-3. B m a n +qb„. 2 . 

If we extend (2.29) to hold for r — n- 1 and n, thus defining 
&„_, and b„, we see that 

A = b„. i , B = b„-pb„. 1 . 

Similarly suppose that when we divide x 2 —px—q into 
S„~ 2 (x) to give (2.27) the coefficients of 7;_ 4 (a) are c , c u 
... c„_ 4 , and we define c n _ 3 , c n _ 2 by the same rule as for c r 
(r = 2 tow -4). Then we find 

C=c n _ 3 , D = c n . 2 -pc„- 3 . 

On using these results in (2.28), remembering that w+w = p, 
we obtain 

Sw = — 


0v-n'Xc,,_ 3 >v-c._ 2 ) 







The computational scheme for the b r and c r is as follows 
(cf. (2.21)): 

a a, a 2 a 3 

pb pb y pb 2 

qb qb x 

K *l *2 b i 

pc pc t pc 2 
c c, c 2 c 3 

a«-2 « n -i a n 

pb„-3 pb„- 2 pb„-i 

Rb n . i <//>„. x qb, , 

l'„-2 *»-i 

b„ (2.31) 

Instead of working in terms of the correction Sw to the 
root it is convenient to work in terms of dp, Sq, corrections 
to the quadratic factor. We have 

(.v-H'-<5iv)(.x-M'-<$iv) = x i — (p + Sp)x-(q + Sq), 

so that 

Sp — Sw+8w, 

Sq b — n'<5iv — u»5»v — ^w^iv. 

On neglecting the second-order term in the last equation 
we find, on substituting from (2.30), that 


<5p = (fc„c„_3-/) n . lCn _ 2 )/A, (2.32a) 

<5<7= -(b n c„- 2 -b„_ t E)IA, (2.32b) 

A - C 2 _ 2 - Cfl _ 3 £, (2.32c) 

E = pc n - 2 + qc„. 3 . (2.32d) 

These are Bairstow's formulae. They are equivalent to the 
following simultaneous linear equations for Sp, Sq: 

c„. 2 5p + c„- 3 Sq = -*>„_„ (2.33a) 

ESp+c„. 2 5q = -b n . (2.33b) 


The derivation given here has shown the very close relation- 
ship between Bairstow's formulae and the Newton-Raphson 

Bairstow's method depends on locating a quadratic 
factor approximately before iterating to obtain the accurate 
factor. It should be noted that methods such as root- 
squaring (Graeffe's process) and the Aitken-Bernoulli 
process are available for calculating complex roots of 
polynomials even though no initial approximations are 

Ex. 2.6. Find the complex roots of smallest modulus for the 
polynomial equation in Ex. 2.5, to four decimal places. 

As an approximation to the required quadratic factor 
we take the last three terms of the polynomial which give 
(.v 2 + 01 287a- +01 229). The procedure in the table (2.31), 
omitting the second, third, fifth, and sixth rows, which do 
not need to be recorded, gives 

100000 2-65300 4-51200 -2043 00 -0-26300 -025100 
1000 00 2-524 30 4064 22 -2-876 30 -0-392 31 152 99 
100000 2-395 60 3-63301 -3-63829 

From (2.32) we find 

£ = 002175, A = 13158 14, 

Sp = -0 066 23, Sq = 041 65. 

On computing the new/), q and repeating the procedure we 

P a K-\ K 

—0194 93 -008125 0003 26 -0006 83 

-0-195 74 -0-083 12 -0000 01 +0000 03 

Further iteration indicates that /;, q arc accurate to one unit 

t Sec, for instance, Modern Computing Methods, H.M. Stationery 
Office, 2nd Edn. (1961). 



J 2.7 

in the fifth decimal place, and solution of the corresponding 
quadratic gives the complex roots 0-0979 + 0-27 12/, to four 
decimal places. 

Examples II 

Ex. 2.7. Find the real roots of the following equations 
to four significant figures 

(i) x = cos x, 
(ii) .v 2 -l = sin a, 
(iii) X» = 5. 

Ex. 2.8. The equation x* + 1-5.\- 1-5 = can be written 
in the forms 

(i)x = l--fx 3 , 
(ii) x = l-5(l-5 + x 2 )-\ 
(iii) x = (l-5-l-5x)*. 

On applying the method of § 2.2 find which of the corre- 
sponding iterative procedures are convergent. Hence 
determine the real root of the equation to four decimal 

Ex. 2.9. If the root of the equation cot .v = kx between 
and \k is small show that it is given approximately by 
x«(Jfc+|)~*, where k is large. If the root is near {it show 
that it is given approximately by x«in(l +k)~ l where k is 
small. Use these results as first approximations to calculate 
the roots of the following equations to the appropriate 
number of significant figures, assuming that the coefficient 
of -v in each case is correct to the last digit given: 

(i) cot x = 111-lx, 
(ii) cot a: = 1-lllx, 
(iii) cotx = 0011 llx. 




Ex. 2.10. Use the Newton-Raphson method to obtain 
the following iterative formulae for x = a 1 '", (a>0): 

f{x) m x"-a, x r+1 = n- l {(n-i)x,+ax}- n }, 

fix) = t-(«/x"), x r+1 - »- , {( n +l)x r -<^ 1 x:: +, }. 

If x = x r +5x r show that in each case 5x r+l = /» r 5.v 2 for 
suitable p r . Deduce that if n> 1 or«< — 1 the two sequences 
approach x from opposite sides. Illustrate these results 
by calculating n*, «""■ by the two sequences, as accurately 
as possible, given n = 3141 593. 

Ex. 2.11. Show that x r+ , = (x r 3 + 3ax r )/(3x 2 + a) is a 
third-order procedure for the calculation of yja. Illustrate 
this result by calculating ^11 to nine decimal places, using 
x = 3. (Compare Ex. 2.2.) 

Ex. 2.12. Let ;•, = /,(*), y 2 = f 2 (x). Show that the 
chord joining (x,, ;■,) to (x 2 , y 2 ) cuts the x-axis at 

This is the basis of the well-known method of false position! 
for finding the roots of/(x) = 0: the value of y 3 =/(xj) 
is computed and the procedure is repeated using (x 3 , y 3 ) 
instead of either (x,, ;•,) or (x 2) y 2 )- It is convenient to 
choose the two values of x at each stage so that they lie on 
opposite sides of the root. The method is essentially inverse 
linear interpolation. Use this method to solve the equations 
in Exs. 2.1,2.3. 

Ex. 2.13. A method due to J. H. Wcgstein:!: which 
improves the convergence of the iterative procedure of 

t E. T. Whittaker and G. Robinson, Calculus of Observations 
Blackie (1944), p. 92. 

X Comm. Assoc. Compui. Mach. 1 (1958) No. 6, p. 9. See also 
G. N. Lance, Numerical Methods for High Speed Computers, IlilTe 
(1960), pp. 134-138. 


§ 2.2 for solving the equation x = F(x) is given by 

X n+l^n-l~^n X n 





-^n+l — 

(n = 1, 2, ...), 

x n+l~ ■*n + ^'n-l x n 

where x„ +1 = F(X n ), and X , X t are arbitrary. Show that 
this procedure is simply a systematic version of the method 
of false position described in Ex. 2.12. 

Ex. 2.14. Consider the following sequential calculation. 
I f we define 

a =\, fc = (l + .v 2 )*, 

", + . = i(fl r +b r ), b, +l = frfed*, r = 0, 1, 2, .... 

then it can be shown that, as r tends to infinity, 

lim a r = lim b, = x/arc tan x. 

This is not an iterative method of the type considered in 
this chapter since the limit depends on the starting values. 
In particular it is vital to avoid mistakes in the above 

Ex. 2.15. Let ±x u ±\,, ±x 2 , ±x 2 denote the roots of 

x 8 -2(2p 2 -0-3)x 6 +(ie*+6p 4 -6p 2 +3-73).x 4 

-2p 2 (2p 4 -3-7p 2 + l-7)x 2 +pV-l) 2 = 0. 

Calculations on an automatic computer gave the following 
results (Q = 100): 

forp = 4, x, = 50- 159 + 49-842;, .v 2 = 0-155 39H 0-15445/, 

forp = 10, xi = 51 017 + 49022/, .v 2 = 1-0139 + 0-9745/. 

Veiify that for the smaller roots the formula 

e{i+ 6 ' 2(p2 - 

> 2 -p 1 


-<^'-^- 2j mr} 

gives results correct to 1 in 1 5,000 for p = 4 and 1 in 5000 
for p = 10. Derive a formula accurate to 1 in 10,000 for 
the larger roots. (The moral of this example is that even 
though numerical results can be obtained by an automatic 
computer this should not blind one to the possibility of 
obtaining an adequate approximate answer in analytical 



§ 3.1. Introduction. For the purposes of this book it is 
unnecessary to go into detail regarding the structure of 
electronic digital computers; it will suffice to make some 
general remarks about the way in which an automatic 
computer works. An automatic computer has to be told 
precisely, and in detail, the steps which have to be performed 
to solve any given problem. From this point of view a 
calculation consists of two parts: a series of instructions 
specifying the method to be used, and a sequence of numbers 
leading from the data of the problem through various steps 
in the computation to the answer. It will be assumed that 
instructions and data are punched on cards which have to 
be fed into the machine, and that the results of a calculation 
are printed out by the machine on an output sheet of paper. 
For our purposes a digital computer can be considered to 
have the following components: 

(a) A memory or store in which numbers and instructions 

can be stored and from which any required 
number can be produced at will. Each number 
in the memory is allocated a memory position or 
address which can be regarded as a pigeonhole in 
which the number is stored. 

(b) An input mechanism for transferring the information 

on the input cards into the memory. 





(c) An output mechanism for transferring information 

from the memory onto the output sheet. 

(d) An arithmetic unit for carrying out simple basic 

arithmetical operations such as addition, subtrac- 
tion, multiplication, division. 

(e) A control unit which organises the calculations, i.e. 

arranges for the input and output of information 
and the execution of arithmetic operations in 
the correct sequence as specified by the instruc- 

The set of instructions for solving a problem is called 
the program. When solving a problem the computer 
begins by storing the complete program in its memory 
via the input mechanism. It then proceeds to obey the 
instructions in an order determined by the program. 

The basic instructions which a machine obeys are 
usually very simple. For instance it will need at least one, 
and perhaps several, basic orders to execute the operation 
" Add the two numbers contained in positions a and b in 
the memory and place the result in memory position c ". 
The reader will appreciate that to program a problem in 
terms of " basic machine language " would require con- 
siderable background knowledge of individual machines. 
It would also be very laborious. Fortunately there is an 
increasing tendency to produce systems by means of which 
a programmer can write his program in a relatively simple 
language resembling ordinary algebra for the formulae and 
ordinary terminology for the instructions. This simple 
language must of course be interpreted in terms of, or 
translated into, basic machine language before the computer 
can solve any given problem, but this interpretation or 
translation can be done by the machine itself using a 
" compiler ". The programmer need only prepare his 
program in the simple language. We shall describe a simple 
system of this type, devised for teaching purposes. 



9 3.2 

§ 3.2. Simple programs. Consider the evaluation of 

! = 


J{R 2 + [2nfL-U(2nfC)Yy 


for various sets of numbers R, L, C, f. It will be assumed 
that each set of R, L, C, f is punched on a separate input 
card. A program for this calculation can be written in the 
form of five statements as given in Program 3.1. (Instead of 
" statement " the words " instruction " or " order " can be 
used, if preferred.) 


Label Statement 

9 Read/?,L, C,f 

II = 6- 2832 fL- 1/(6- 2832 fC) 

Print/, R,L, C,f 
Go to 9 

The first statement makes the computer read the first 
card containing input data. Four numbers are on this 
card in the form of four sets of punched holes. The 
machine associates four memory positions with the symbols 
R, L, C,f and stores the four numbers from the input card 
in the corresponding memory positions. From now on, 
instead of saying " The number in the memory position 
associated with the symbol R " we shall say " The number 
associated with R " or, shorter still, " The number R ". 

The second statement says " Perform the calculation on 
the right-hand side of the equation, using the numbers just 
associated with L, C, f. Store the result in a new memory 
position associated with the symbol H ". The third state- 
ment has a similar meaning: it computes and stores /. 
These two statements could of course be written as one 




single statement, but it is slightly clearer to write them 

The fourth statement makes the machine print the 
computed value of i and the values of R, L, C/from which 
/ has been derived. 

The last statement transfers control to the statement 
numbered 9; in other words it instructs the computer " Go 
to the statement 9 and do what it says: then perform the 
statement following 9. and so on". In Program 3.1 this 
means that the computer reads the next input card, which 
contains four new values of R, L, C, f. The computer 
replaces the old values in its memory by the new values and 
repeats the sequence of operations. The old values are lost 
completely when they are replaced by new values. 

Instructions concerning numerical calculations, like the 
second and third statements in Program 3.1, are called 
arithmetic statements. The other instructions, which 
organise the numerical calculations, are called control 

The arithmetic statement 

a = (A + 2-431)/j- 


says "Compute the number a by adding 2-431 to x and 
dividing by y ". Hence x and y must have occurred 
previously in any program in which this statement occurs, 
since the machine must go to memory positions already 
allocated to the symbols x and y in order to find the 
numbers represented by these symbols. The precise 
meaning of the statement depends on whether the symbol a 
has occurred previously in the program. If a has occurred 
previously there will be a memory position already associated 
with a. The meaning of (3.2) is then " Obliterate the 
number in the memory position associated with a and 
replace it by the number computed from the right-hand side 
of the equation". If a has not occurred previously the 
meaning is " Allocate a memory position to the new 



8 3.2 

symbol a and place in this memory position the number 
computed from the right-hand side of the equation ". It 
is clear that, although arithmetic statements resemble 
ordinary algebraic formulae in appearance, the " = " sign 
is used in a different sense. No confusion can arise from 
this because programs contain only statements (i.e. instruc- 
tions), never algebraic formulae. 

The rules governing arithmetic statements can be 
summarised as follows: 

(i) It is essential to have only one symbol on the 
left-hand side of a statement. (For instance (3.2) 
cannot be written ay = x +2-431.) This symbol 
may have suffixes, e.g./,,/ r , a,j. If the suffix r 
is a variable parameter, for example if /• = 1, 2, 
... «, the value of r must have been specified 
earlier in the program. 

(ii) Any symbol on the right-hand side of an arithmetic 
statement must have occurred previously in the 
program. (Incidentally this is why we have 
written 6-2832 in Program 3.1 instead of 2n since 
the symbol n has not been defined in the program. 
We could of course have written n = 3- 14 16 as 
the second statement and then the third statement 
could have been H = 2nfL- ]/(2nfC).) 

(iii) Apart from (i) and (ii) any arrangement of symbols 
or notation which is self-explanatory according to 
the ordinary rules for writing and interpreting 
algebraic formulae will be allowed. 

We next consider an extension of Program 3.1. Suppose 
that for any given set of R, L, C we wish the machine to 
compute first the value of/ for which i is a maximum, 
namely, from (3.1), 

/' = 


2rr(XC)* - 




and then evaluate i for 

f=rh, r= 1(1)«, h = F/n, 

where n is a given integer. The notation r = l(l)n, which 
we have already met in Ex. 1.4. means that r starts at unity 
and increases by steps of unity until n is reached. A 
suitable set of instructions is given in Program 3.2 which 
contains only one new type of statement in addition to 
those in Program 3.1. 

Programs 3.2(a) and 3.2(b) compute the same quantities. 
We first of all consider 3.2(a). The interpretation of the 
first three statements has been covered by the discussion of 
Program 3.1. The fourth statement 

Do up to 3 for r = 1(1)/? 

instructs the machine " Perform all statements which 
immediately follow, up to and including the statement 
numbered 3, n times, the first time for r = 1, the second 
time for /• = 2, and so on, and the last time for r = n. 
Then go on to the statement following statement number 3." 
The first time round, the machine therefore computes and 
stores/,, //,, /,, the second time/ 2 . H 2 , i 2 , and so on until 
finally /„, //„, /„ are computed and stored. When this is 
done the computer will print the/j and i t (.? = 1 to n) and 
/?, L, C. The part r = 1(1)h of the " Do up to " statement 
obeys rules which are similar to those governing an arith- 
metic statement, namely that the symbol on the left may or 
may not have occurred previously in the program, but any 
symbols on the right must have occurred previously in the 
program. Compared with Program 3.1 the " Print " order 
in Programs 3.2(a), (b) has been extended to print a sub- 
scripted variable with the range of suffixes clearly indicated. 
Commas are used to separate independent variables but the 
word " and " is used to separate subscripted variables with 
l he same range. 







The numbers " 9 " and " 3 " in the " Label " column 
of Program 3.2(a) are simply identifying tags attached to 
certain statements, so that these statements can be referred 


Ladfi. &TATEMBN i 

9 Read R, L, C, n 

F=0-I5915 (LC)-* 

// = F/n 

Do up to 3 for r = 1(1) n 

/, - rh 

//, = 6-2832/,!- 1/(6- 2832 f r C) 
3 /, = (/? 2 +//,*)-* 

Print R, L, C, //,/, and /, for s = 1(1) n 

Go to 9 


Labh. Statement 

9 Read R, !, C, n 

F = 0- 15915 (!C)-* 
li - /> 

Do up to 3 forr = 1(1) n 
/ = /+// 

// - 6-2832/!- 1/(6- 2832 fC) 
3 J r = (/? 2 + // 2 )-* 

Print fl, !, C, ff, F, /;, /, for s = 1(1) n 
Go to 9 

to in control statements such " Go to " and " Do up to " 
in tliis example. For labels we use positive integers. These 
need not be in increasing order of magnitude. As labels, 

instead of numbers we could equally well have used Greek 
or Roman letters. 

In Program 3.2(a) the machine stores H r (r = 1 to n) al- 
though these are used only for immediate calculations, and 
are not required after the corresponding i r have been found. 
Storage space can be saved by arranging the computation 
as in Program 3.2(b) where for illustrative purposes we 
have decided that storage and printing of all the/, (r — Hon) 
is also unnecessary. Consider the statement 

This means " Take the number in the memory position 
associated with / and add /; to it. Place the result in the 
memory position associated with /". The old number 
which was in/ is overwritten. Any subsequent reference to 
/ applies to the new number which has just been stored in 
the memory position associated with /. Program 3.2(b) 
should now be self-explanatory. 

Suppose finally that in addition to the calculation just 
considered we wish to evaluate /, for 

f r = rh, r = «+1,m + 2, ... W, 

where Wis defined by the inequalities 

l N <plR£t„. u (p<l, given). (3.3) 

The maximum value of /, is \jR, and i r is monotonically 
decreasing for r>n, so that we wish to compute i r for 
increasing r until the value of/, falls below some prescribed 
fraction of the maximum value of /. This can be done as 
in Program 3.3 where the first eight statements are the 
same as in Program 3.2(b) apart from the addition of a 
parameter to the " Read " statement. 

The point of this example is that the number N in (3.3) 
is not known beforehand. The machine has to determine 
N by computing /, for s = n+ 1, n+2, ..., each time com- 
paring the current value of i a with p/R until i,<pjR. This 







procedure is controlled by an " If" statement: 
in^p/Rgoto 14 

which instructs the machine: " If i s ^p/R transfer control 
to statement 14 as in a 'Go to' statement. If i a <p/R 
perform the statement immediately following the ' If ' 
statement, namely ' Print . . . '." In the second half of 
Program 3.3 we have used s, g instead of r, f. This is not 
essential but it emphasises that the program consists of two 
distinct parts. The value of s which is printed is the value 
assigned to the variable s when the " Print " statement is 
reached, and this is the same as the number N in (3.3). 

The possibility of programming Yes-No decisions by 
means of " If" statements is one of the distinctive features 
of performing calculations by means of an automatic com- 
puter. The " If" statement allows the machine to control 
its own calculations. 

From our point of view the main difficulty in program- 
ming does not lie in the arithmetical calculations themselves 
but in the organisation of the sequence of calculations. 
It is often helpful to draw a flow-diagramf to clarify the 
logical structure of the program to be used. This is 
illustrated for Program 3.3 in Fig. 3.1 which should be self- 
explanatory. A flow-diagram consists of boxes connected 
by lines. The boxes are essentially of two types. A test 
box or If-box has one input line and two output lines. The 
output line which is selected depends on whether the answer 
to the question in the box is " yes " or " no ". Any other 
kind of control or arithmetic statement is placed in an 
assertion box. It is convenient to make test boxes oval 
and assertion boxes rectangular, as in Fig. 3.1. In contrast 
to the relatively limited kinds of statement used in programs, 
any kind of statement is allowed in flow-diagrams as long 

t An excellent collection of flow-diagrams (or flow-charts) is 
contained in Mathematical Methods for Digital Computers, A. Ralston 
and H. S. Wilf, Wiley (1960). 

as the meaning is clear. A flow-diagram tends to be 
descriptive but the logical interconnections are described 
precisely. Flow-diagrams can vary from description in 
broad outline to description in detail. At the one extreme 
the flow-diagram will consist of almost equal numbers of 

Label Statement 

9 Read R, L, C, n, p 

F = 0- 15915 (LQ-* 
1 1 = Fin 
Do up to 3 for r = 1(1) n 


H = 6- 2832 fL- 1/(6- 2832 fC) 
3 i r = l/(/? 2 + // 2 )* 

x = n 

9 m "'' 

14 .V = 5+l 

9 = 9 +h 

II = 6-2832 gL -1/(6' 2832 gC) 

If i^p/R go to 14 

Print s, i, for t = 1(1) s, R, L, C, n, p 

Go to 9 

assertion and test boxes, with little detail in the assertion 
boxes. At the other extreme the programmer can convert 
the flow-diagram into a program directly by merely removing 
the lines and boxes, and inserting labels and " Go to " 
statements as necessary. In the flow-diagram in Fig. 3.1 
the control instructions are given in detail but no arithmetic 
information is given. 










s another inputN Nn. 
card available'.',-' 



Read in the next 
values of R, A, C, «. /> 

Compute F. h 

Compute /,, r = l(l)n 


Scl i = n 


Increase s by 1 

Compute /, 

?\ v« 

Mill a, /,(; - I lo v). It. A. C, n. I' 

Fig. 3.1 Flow-diagram for Program 3.3 

§ 3.3. Some programs involving iterative procedures. The 
problems involved in programming an iterative procedure 
for an automatic computer can be illustrated by considering 
the solution of equations by means of the Ncwton-Raphson 
procedure. From §2.3, equation (2.10), the equation 
f(x) = is solved by means of the iteration 

.v r+1 = x r -f(x r )!f(.x r ), 

where x r is the rth approximation to the root x. 

It is necessary to start by obtaining initial approxima- 
tions .v for each of the roots which we wish to determine. 
It may be possible to locate the roots roughly by hand 
calculation and include the resulting estimates in the input 
data for the machine, but usually it is better to program 
some procedure by means of which the machine itself can 
make initial estimates of the roots. 

The next step is to program the machine to perform the 
iteration. There are then two possibilities. The procedure 
may converge satisfactorily in which case the machine can 
be made to iterate until the difference between successive 
iterates is sufficiently small. Alternatively, the procedure 
may not converge or may converge extremely slowly, in 
which case it is necessary to alter the iteration in some way 
or, if desired, stop the iteration altogether. 

A useful criterion for convergence is the following.! 
We define e, = x r+l — x r . When an iterative procedure is 
converging towards a root we shall have | e r+ , | < | c, | 
until the e r become so small that they arc affected by the 
fact that only a finite number of decimal places are used in 
the calculation. In simple computations we may find that 
x r = x r+ i = A- p+2 sothat£ r = c f+1 = 0. If the calculations 
are appreciably affected by round-off error, however, 
random error fluctuations will eventually give | e r+ , | > | e, j 
for some r. (Compare the discussion of random error 

t This was pointed out to me by Dr D. J. Wheeler. 




fluctuations in § 2.6.) In either case we can use the con- 
dition | e,+ i | ^ | e, | as a criterion that the calculation has 
converged to the degree of accuracy allowed by round-off 
error. The level of random variations can be regarded as 
a noise-level limiting the accuracy of the calculation. 

An even simpler criterion can be used if we know that 
the .v, vary monotonically with increasing r. Suppose for 
example that theoretically .v r+ , > x, for all r. Since we are 
dealing with numbers specified to a finite number of 
decimal places we shall eventually have x r+l £x r due to 
either repetition of the iterated value, or round-off error 
and the noise-level of the calculation. This condition 
can therefore be taken as a criterion of convergence. 

Some of these points are illustrated in the following 

Ex. 3.1. Program 3.4 gives a straightforward routine for 
finding the square root of a positive real number a by 
means of the following formula (Ex. 2.2): 

*+| = M*r + («/*r)}- 

If the machine is given a negative value of a by mistake 
this formula will give nonsensical results. In this case we 
make the machine stop. The value of a has been printed 
out, and this will indicate the source of the trouble. (This 
is the first time we have used the self-explanatory " Stop " 
order. We might have used instead an " Alarm " order 
to stop the machine and warn the operator that something 
has gone wrong with the calculation.) As an initial estimate 
of the root we take unity and a single iteration gives 
•Vo = iO + fl )- We have 

Xr-yja = lfe-i-V4fe-t2ft rm^UX ■••. (3-4) 
where for r = we must define x_j = 1. Hence x,^Ja 
and x?7>a. (This is the reason for choosing .v = 1(1 +a). 
If we choose .\ = 1 it would no longer be true that xf^a 




for all r^O.) Equation (3.4) proves that the procedure is 
second-order. From (3.4) 

(x-Ja) - |§A— jt|fc. r ^ 

Hence the procedure is convergent if 


x, _ 



Labfl Statement 

Read a 

Ifa^Ogo to 12 
Print a 
12 x-^l+fl) 

2 i/ = x 

X = K.v+(a/*)) 
If .v<« go to 2 
Print x, a 

Since .v,_ j ^ v 'n for all r the procedure is always convergent. 
We have also 

so that successive iterates decrease until the procedure 
converges or reaches the noise-level of the calculation. 
This means that we can use the condition .r r+l S;.v r as a 
criterion for convergence. After the " Print x, a " order 
the machine goes on to the next part of the program. 

One disadvantage of Program 3.4 is that if a is very 
different from unity then the procedure is initially somewhat 




slowly convergent. This is avoided in Program 3.5 where 
by successive multiplication or division by 16 we ensure 
that the iterative procedure is used only on numbers 
between £ and 4. To obtain an accuracy of say 10" 10 it 
is readily verified that no more than five iterations are 
necessary and this fact is used in the program. The para- 
meter r in the " Do " statement is used merely to count the 
number of iterations and it does not occur in any other 
statement. Note that the case a = has to be considered 

One way in which this example is unrealistic is connected 
with the efficiency of the method. The square root opera- 
tion will be provided as a standard routine for any machine 
and it will be used many millions of times per year. If we 
can save one millisecond in the time taken by the square 
root program this will save about 17 minutes of computer 
time every million square roots that are computed. Such 
economies can quickly build up into a substantial saving 
of computer time. Hence it is worth while devoting a 
considerable amount of effort to developing an efficient 
square-root program. The basic machine language will be 
used and advantage will be taken of any special features 
which allow time to be saved. The way in which a computa- 
tion is programmed will then depend to a large extent on 
the way in which calculations are performed by the arith- 
metic unit inside the machine. Thus if all numbers inside 
the machine must be less than unity in magnitude and 
0<a< 1 we cannot calculate i(l+a) by first of all forming 
1+a, although i + Ja would be permissible. However, 
considerations of this kind lie outside the scope of this 

Ex. 3.2. Draw a flow-diagram for Program 3.5. 
Ex. 3.3. Find the real roots of 

P n (x) = fl x n + a 1 x"- , + ... + a n = 0. 




Label Statement 

Read a 
Print a 

Ifa^Ogo to 12 
12 k = 

If a>4 go to 3 
If a<£ go to 4 

6 « = HI +«) 

Do up to 5 for r = 0(1)4 
5 u m i(« +(«/«)) 

Go to 8 

3 a = a/16 

If a>4 go to 3 
Go to 6 

4 Ifa>0goto7 
.v = 

Go to 9 

7 a = 16a 
k = k-l 
Ifa<igo to 7 
Go to 6 

8 x = 4"u 
a m 16*a 

9 Print x, a 







In this example we shall use .r,, x 2 , ... x m to denote the 
real roots of this polynomial. (Elsewhere we have used 
x r to denote the rth approximation to a root .v.) Suppose 
that the following procedure is used. We first of all find 
the root of smallest absolute value a - , and then divide 
PJ. X ) by at-.v, giving a polynomial P„-i(x) of degree «— 1. 
We then find the next smallest root x 2 and divide / > „_,(.v) 
by x-x 2 obtaining P„^ 2 (x) and so on. This sequence is 
repeated until we obtain .v,, x 2 , ... x m and / 5 n _ m (.v) where 
this final polynomial has only complex roots. It is found 
that this method is very satisfactory numerically if the 
well-conditioned roots are found and removed first. It is 
for this reason that the roots arc found in the order of their 
size, starting with the smallest, since the smaller roots tend 
to be better conditioned than the larger. This has already 
been mentioned in connection with (2.24). When finding 
roots by factor-removal methods it might seem desirable 
to try to avoid error build-up due to errors in the roots 
which are found first by performing final iterations on the 
original polynomial. However, this can be inconvenient 
and it is not usually necessary if the well-conditioned roots 
are found and removed first, since a badly conditioned root 
is difficult to determine accurately whether we iterate on 
the original polynomial or on a reduced polynomial. 

The roots will be found by means of the Newton-Raphson 
procedure. This means that it is necessary to estimate first 
approximations for the roots. For the first approximation 
to *,(/•> 1) the smallest root of P„. r+1 {x) we shall choose 
x r _j, the value of the smallest root of P„- r+2 i. x ) which has 
just been found at the previous stage of the calculation. 
This is a reasonable procedure since if we find the smallest 
root at the first stage of the calculation we are likely to find 
the roots in increasing order of size. Also in the case 
where there are two roots close together, which is a trouble- 
some case, the first of the two roots will give a close initial 
estimate for determination of the second. 

To start the procedure we need an estimate of at, which 
we shall denote by (jc,) . This will be found in the following 
way. If | a n _ , | ^ | aj we set (a.-,) = -aja„. , ; otherwise 
we set (a,) = 1. To understand why we choose (*,)(> in 
this way we recall that (§ 2.5 (iii)) 


i-1+A+...+ A. 

A large value of | a„. Ja„ | indicates that at least one of the 
Xj is small, and it is reasonable to estimate the smallest root 
by choosing (x,) = — a n /fl„_ ,. On the other hand a small 
value of | a„.i/a„ | does not necessarily indicate that the x, 
are large, since cancellation may occur. A simple example 
is x a +0-0001x-l = where -aja n -i = 10,000 but the 
roots arc of order unity. It would obviously be unreason- 
able to take (.v,) = -a„/fl n _, in this case. If we choose 
(*i)o = 1 when I a«-i I < I o n | then if the smallest root is 
much greater than unity the Newton-Raphson procedure 
should still converge to it, though initially the rate of con- 
vergence may be slow; and the procedure will converge to a 
small root if this exists and |a n _,/a n | is small merely 
because cancellation has occurred. 

The problem of how to define (x,) illustrates one of the 
difficulties of automatic computation. The program must 
tell the machine exactly what to do. Ideally we should like 
to be certain that the value which we tell the machine to 
take as (.v t ) is a reasonable approximation to the smallest 
root. But we can always invent examples for which any 
specific criterion fails. The percentage of failures can be 
reduced by giving the machine more complicated rules for 
the determination of (X|) . We have to compromise between 
the difficulty of programming more complicated procedures, 
and the desire to ensure that (a-,) is a good approximation 
to the smallest root of the polynomial. The suggested 
procedure will not always locate all the real roots. This is 
partly because we arc writing a general program to solve a 




wide class of problems. It is not surprising that we can 
invent examples for which the program will not work. On 
the whole, the more specific the problem the easier it is to 
write a program such that it will be virtually certain that an 
automatic computer will obtain the correct result. 

Program 3.6 gives one method for carrying out the 
previously described procedure for finding the real roots of 
a polynomial. A flow-diagram for this program is given in 
Fig. 3.2. Starting at the beginning of the program, when 
statement 3 is reached the value (.v,) defined above has 
been assigned to a variable u. After defining the values of 
various symbols the basic iterative procedure for finding the 
correction 8 is performed in statements 7 to 11, using the 
method and notation described in some detail in §2.5. If 
| 8/u | > 10" 4 it is assumed that the process has not yet 
started to converge and the machine repeats the iteration. 
(The figure 10~* is chosen arbitrarily as a figure much 
larger than the accuracy of the calculation which is, say, 
1 in 10" 9 , but small enough so that when | 8/u | < 10~ 4 we 
can be reasonably certain that the procedure is converging. 
Depending on circumstances it may be more convenient to 
work in terms of absolute instead of relative values, say 
| 8 | <10~ 4 indicates convergence.) If | 8/u | <10~ 4 on 
the first iteration the program defines a variable £ = 8 and 
then repeats the iteration. In general the program sets e 
equal to the old value of <5 before starting an iteration to 
find a new <5. If the iteration is converging normally we 
have | 8 | < | e |. We assume that when, after a period of 
normal convergence, | 5 | — | £ | the process has converged 
and the root has been determined to the maximum possible 
accuracy for the method used. The current values of u 
and 8 are printed out, where <5 is the next correction to u 
and gives some idea of the " noise-level " of the calculation. 
(This idea could be extended by asking the machine to 
perform say four iterations beyond the point where 
| 8 | — | £ |, and record the maximum and minimum values 




Label Statement 

Read a, for i = 0(1) n 
Print a, for t = 0(1) n 
u = 1 
If|fl n _, | < | a„ | go to 3 

" - -<hja.-i 

3 b = a 

Co = «u 
/- 1 
r = 
go to 7 
£ = 5 
u = u+S 

Do up to 10 for m m 1(1) n-j+l 
K = a m +b m - x ii 
c m = b m +c m . l u 

6 = - b n-J-n/ C «-j 

If |5/i/| <10-*go to 12 

r = r+l 

Ifr<20go to 6 

Print u, 8, a, for i = 0(1) n-j+l 
20 Stop 

12 Ifr = 0goto6 

If | 8 | < | c | go to 6 

Print ii, 8 

Ify = n go to 20 

7=7 + 1 

Do up to 15 for m = 1(1) n-y+1 
15 4n = K 

m m 

Go to 5 





Read and print o, (i = to «) 

If- 1 

" = — «' fl »-i 

Set b„ m r„ - o 


Setr = 

Set e = 8 
ii = ii+8 

5 - -6,-y t , /£•„-/ 


Is [5/i*i <10" 4 ? 

/■by 1 

hr<20?) >- 

:isr = 07 ^_^ 



Print «, 8. 
«. fori = OOVi-J+l 



o„ by />„ 

Yes ""■ -*■ No 

Fta. 3.2 Flow-diagram for Program 3.6 





of «. Note that it is a better check on the accuracy of the 
calculation to print out 5 than to print out the residuals 
which may be small even though the roots are inaccurate, 
as shown in §2.6.) Once the process has converged the 
program tests to see if j = n which indicates that the 
polynomial has real roots, all of which have been found. 
If this is the case the calculation is stopped. Otherwise the 
new polynomial />„_/*) is set in place of P„. J+l (x) and the 
iteration is repeated. The variable u is already set up as the 
root of /»„__,+ ,(.v) which has just been found, and this is 
the appropriate first approximation to the smallest root of 

§3.4. General comments. We shall illustrate briefly 
some differences between automatic and hand computing 
by considering the problem of finding the real roots of a 
polynomial, assuming that the program of Ex. 3.3 is used 
for the automatic computer and a similar method is used 
for the computation by hand. 

(i) In hand computing a convenient method for locating 
the roots is to evaluate /(oo),/(0), /(-co) and then/(x) 
for suitable intermediate x. We should probably draw a 
rough graph of y = f(x). Although it is not impossible 
to program this type of procedure we have preferred a 
simpler method for the automatic computer program. 

(ii) In hand computing, when calculating the smallest 
root of each of the polynomials P n -,(x) we would usually 
try to obtain an accurate first approximation for this root 
rather than merely taking the root of P n _ J+l (x) found at 
the previous stage of the calculation. 

(iii) In hand computing we can control the number of 
decimal places used in the calculation. If the final answer 
is required to a large number of decimal places we would 
use a comparatively small number of decimal places in the 
early stages but increase the accuracy as the procedure 




converges. There is no point in doing this with an auto- 
matic computer. 

(iv) In hand computing we would decide the number of 
decimal places accuracy required in the final result and 
adjust the calculation accordingly. In automatic computing 
it is simpler to work to the full accuracy available in the 
machine and determine the " noise-level " for each root. 

(v) In hand computing we would check at the end of the 
calculation that we have located all the real roots. We 
have not incorporated any such check in our program. 
When using a computer it might be simpler to merely 
calculate all the roots, both real and complex, and then we 
should be sure that all the real roots have been determined. 

(vi) In hand computing a convenient check is to divide 
the final polynomial with complex roots into the original 
polynomial, which should give zero remainder, though 
there will in fact be a small remainder from inaccuracies 
due to rounding-off error. Wc can then check that the sum 
and product of the known roots agree with the appropriate 
coefficients of the dividend. In automatic computing it is 
probably unnecessary to apply this check. 

The following are some general comments. 

On the whole, in programming for an automatic 
computer we prefer a numerical method which involves a 
simple repetitive procedure, even though this is repeated a 
large number of times, to a method involving a complicated 
procedure which is repeated only a few times. The reverse 
is often true in hand computing. 

In hand computation the total number of operations 
involved is usually comparatively small and the effect of 
round-off errors can usually be made negligible merely by 
keeping one or two extra decimal places. In automatic 
computing the calculations tend to be large in scale and the 
number of arithmetical operations may be enormous. It 
may be much more difficult to estimate the effects of errors 




of various kinds and it may be important to use methods 
which minimise certain kinds of error. 

The possibility of an automatic computer making 
arithmetical mistakes can be ignored when using systems 
of the type described in this chapter. In this respect 
computation by an automatic machine is very different from 
hand computation. On the other hand errors such as loss 
of significant figures caused by subtracting two large nearly 
equal numbers to give a small one may be much more 
serious than in hand computation since they may easily 
pass undetected. It is most important to bear in mind 
that we have no information about what goes on in an 
automatic computer between the input figures and the 
figures which are printed out. Whereas in hand computa- 
tion we often recognise loss of accuracy or misleading 
results due to unsuitable methods, while doing the calcula- 
tion, an automatic computer will not do this for us unless 
we realise beforehand that this may occur and allow for it 
in the program. 

Other aspects of the suitability of methods for automatic 
computers will be mentioned in connection with the various 
topics dealt with later in this book. 

The main objects of describing the simplified program- 
ming scheme in this chapter are threefold. 

(a) To give the student some idea of how a program is 

organised for an automatic computer. The 
machine must be told in detail what to do, but it 
can control its own calculations to a limited 

(b) To give the student a programming system by means 

of which he can write his own programs for the 
standard methods used in numerical analysis. 
We have already illustrated this in connection 
with the Newton-Raphson procedure and further 
examples will occur throughout this book. 




(c) To give the student some understanding of the 
differences between hand computing and machine 
computing, and of the criteria which govern 
whether a method is suitable for automatic 

However, the reader should not be misled by the simplicity 
of the programming system and programs in this book. 
One of the major preoccupations of anyone using an auto- 
matic computer is the efficiency of the procedure used, 
since computer time is expensive. In practice full advantage 
is taken of special features of a computer and its program- 
ming system in order to devise the easiest and cheapest 
method for doing a particular problem on a particular 
machine. These factors often over-ride purely mathematical 
considerations. They lie outside the scope of this bonk. 

Examples III 

Ex. 3.4. It is required to find a real root of an equation 
/(.v) = by means of the following procedure. Two 
numbers a and h are given such that/(a)<0,/(/>)>0. We 
define a sequence .v„ y r by the equations 

x = a, y =b 

fo+i = i(v r +;v). JY + !=>-,. if /{M-v,+>V)}gO, 
[x r+l = .v„ y, + , = K.v,+>v), if /{i(v r +^)}>0. 

(i) Write a program to determine a root to within 10" m , 
where in is an integer specified in the input data, and it is 
assumed that the number of significant figures used in the 
machine will allow determination of the root to the required 
degree of accuracy. 

(ii) Write a program to determine the root to the 
maximum number of significant digits allowed by the 
accuracy of the evaluation of f(x), assuming that this is 
limited by round-off error. 




(Assume that " Read/(.v) " means that a set of instruc- 
tions is given to the machine for evaluating/ (at). Whenever 
f{y) for instance, occurs on the right of an arithmetic 
statement the machine will evaluate f(x) for x = y and 
use the resulting number in the appropriate place in the 
arithmetic statement.) 

Ex. 3.5. Write a program for finding arc tan x by the 
sequential procedure given in Ex. 2.14. 

Ex. 3.6. Write a program (or a flow-diagram) for 
computing sin x and cos x for any value of x, - co <x< go, 
given that for 0<Mgn/6 


sin i/ = . 

cos » = 

iS + bt + b^ + bt)- 1 ' 

where the a,(i = 1 to 4), b,(i = 1 to 4) are given constants. 
For 7r/6<2i><7i/3 use the relations 

cos 2u = 2 cos 2 v-\, sin 2v = cos (\n-2v). 

(The formulae for sin, cos, are discussed in A. Ralston and 
H. S. Wilf, Mathematical Methods for Digital Computers, 
Wiley (1960), p. 23.) 

Ex. 3.7. Write a program for evaluating the n smallest 
positive roots of the equation cot X = kx by an iterative 
procedure for k = a(b)c (cf. Ex. 2.9). Outline briefly some 
of the principal differences between the way in which this 
calculation could be performed on an automatic computer, 
and the way in which it could be performed by desk 
calculator. (In addition to the points in § 3.4, for hand 
computation we should use mathematical tables for cot x 
whereas in a computer functions need to be calculated. 
This may influence the choice of method. In hand com- 
puting we should check by differences as in § 7.3 below.) 


§4.1. Introduction. Very many problems in applied 
mathematics can be reduced to the solution of a set of 
simultaneous linear algebraic equations. It is therefore of 
great importance to have efficient methods for the numerical 
solution of such systems. 

If we consider the equations 

fl M .V,+fl|,.V 2 + 

+ «■»*„ = h„ 

"„!-v,+a„2-Y 2 + ». + a nn x n = b n , 
the solution can be written, by Cramer's rule,f 

Xj = A,-/A, (J = 1 to n\ 
A = 








and A y is A with the./th column replaced by the column of 
As. This solution is useful for certain kinds of problem 
but it is seldom used in numerical work when the coefficients 
arc pure numbers since much more efficient methods are 
available for this case. As a general rule the direct evalua- 
tion of determinants is laborious and inefficient and should 
be avoided wherever possible in numerical work. 

t See A. C. Ailken, Determinants and Matrices, Oliver and Boyd 
(1962). p. 55. 





The above solution is of theoretical interest. It breaks 
down when A, the determinant of the coefficients, is zero. 
In this case we may have either a multiplicity of solutions 
(when the Ay arc also zero) or inconsistent equations with 
no solution (where at least one of the Ay is non-zero, so 
that if we imagine a limiting procedure in which A tends to 
zero, the formula (4.2) gives apparently infinite .Vy). We 
need not go into detail since it will be assumed, when solving 
simultaneous linear equations, that the determinant of the 
coefficients is non-zero. It can be anticipated, however, 
that difficulties will arise if A is nearly equal to zero (see 
§ 4.4 below). If the b t in (4.1) are zero then the Ay in (4.2) 
are zero, and non-zero Xj exist only if A = 0. This case is 
of importance in Chapter VI. 

It is convenient to distinguish between direct and iterative 
methods of solution of linear equations. The amount of 
computation involved in direct methods can be specified 
in advance whereas an iterative computation is repeated 
until the error is less than the required amount. Direci 
methods are discussed in this chapter and ihc next. Iterative 
methods are discussed in § 6.4. 

When dealing with numerical examples we shall omit 
the Xj and the addition and equality signs in (4.1) and write 
merely an array of numbers: 




When working to a fixed number of decimal places it is 
usually essential to arrange that the largest element in each 
row and column of this array is of order unity. Tin's can 
be done by multiplying the original equations by constants 
and introducing suitable (preferably simple) multiples of 
the unknowns. We return to this point later (§ 4.3). If the 




original coefficients are such that a u = fyj (i.e. if the a i} 
are symmetrical about the principal diagonal a n ,a 22 , • •• O 
the scaling should be arranged to preserve the symmetry. 

Ex. 4.1. Scale the equations 

80*,+ 5X 2 - 0-2*3 = 2-8, 

5*!- 0-2*2 + 0-0125* 3 = -00875, 

-0-2A', +00125^+ 000625*3 = -0025. 

The scaling factors that should be used are more or less 
arbitrary. If, for example, we introduce 

.xr, - 40*!, x 2 = 2X 2 , x 3 = 0-5*3, 

and multiply the equations by 1, 20, 80 respectively, we 

Zv, + 2-5a- 2 -0-4.v 3 = 2-8, 

2-5.v,- 2.v 2 +0-5x 3 = -1-75, (4.3) 

-0-4a, + 0-5.v 2 + x 3 = -2. 

§4.2. The method of successive elimination. The 

method of successive elimination is well-known in elementary 
algebra. Thus to solve (4.3) we can subtract 1 -25 times the 
first equation from the second, and add 0-2 times the first 
equation to the third. This gives two equations in .v 2 , .v 3 . 
Similarly x 2 can be eliminated from these two equations to 
give .x 3 . Then back-substitution in previous equations yields 
.v 2 , .v, in succession. 

The object of this section is to describe a tabular 
procedure, with frequent checks, for the practical application 
of this method when the coefficients are given decimal 
numbers. The calculation and the checks can be arranged 
in the tabular form given below. Although this is written 
out only for a set of four equations in four unknowns, the 
procedure in the general case should be clear. The square 
brackets will be explained later in Ex. 5.4. The numbers in 
round brackets are used for checks. 
















b 2 


1 «,. 


a i2 




(s 3 ) 



a 4 2 







(S 3 ) 

(S 4 ) 

(S 5 ) 

CS = s) 





(h * T 2 ) 




b i3 



('3 * T 3 )] 







('4 * T 4 )] 





(«3 * V 3 ) 






(»4 * L/ 4 )] 




("4 * V A ) 



x, x 2 x 3 x 4 (S' s x S s ) 

The first step is to form check sums which will be used 

s, = a n + a l2 +a l3 +a l4 + b h (/ = 1 to 4), 

Sj =a u +a 2J + a 3J + a 4J , (j = 1 to 4), 

S s = b,+b 2 +b 3 + b 4 , 

s = s, +s 2 +s 3 +s 4 , 

S = S t +S 2 + S 3 + S 4 + S s . 

We should find s = S exactly since no round-off errors are 
involved, and this is the first check on the numerical work. 
We then calculate 

'21 = 02i/0n. /31 ■ 031/011. '41 = 04l/0Jl- 




The 4x4 set of equations is reduced to a 3x3 set by 
eliminating the first unknown: 

b u = fly-/naij, (/,; = 2, 3,4), (4.11) 

c, = b,-l n b„ 

t, = ft-Jafe ( 412 > 

Note that the steps involved in applying these three formulae 
are of exactly the same kind, so that the procedure is easily 
reduced to a routine. This holds also for the similar elimina- 
tions which follow later. The b,j, c,, 1, are calculated line 
by line and at the end of each line we perform a check by 

T l = b i2 + h i3 +b u +c„ (1-2,3,4), (4.13) 

which should agree with the f, to within a unit or two in the 
last significant figure, any difference being due to rounding 

In the same way we calculate 

^32 ■ b 32 /b 22 , hi ■ b A2 lb 12 . 
The 3 x 3 set of equations is reduced to a 2 x 2 set by forming 

cij = K ~ lafafi ('". J = 3. 4), (4. 14) 

with similar equations for d,, u ; , and a check 
Ui = e a + c IA +d„ (f-3,4), 

which should agree approximately with u t . 

The 2 x 2 set of equations is reduced to a single equation 

"44 = c 44 — '43 c 34> 

with similar equations for e 4 , u 4 . Also V 4 = d 44 +e 4 xv 4 . 
At this stage we have reduced the original equations to 




the system (4.4), (4.5), (4.8), (4.10): 

ail*l+«12*2 + *13*3 + «14*4 = *I 

b 22 x 2 + b 2i x 3 + b 24 x A = c 2 (4. 16) 

c 33 x 3 + c 34 .y 4 = f/ 3 

^44*4 = *4- 

The unknowns can now be found by back-substitution. 
We have 

-v 4 = eJd AA , 

* 3 = (</ 3 -c 34 .v 4 )/c 33 , etc. 

A check on the final answer is provided by the equation 

S' s = S,x, + S 2 .v 2 + S 3 .x 3 + S 4 .v 4 . 

S' s should agree with the previously computed S 5 except 
possibly for a small difference due to rounding-off errors. 
However, agreement between S s and S' s to a given number of 
decimal places does not necessarily indicate that the x t are 
accurate to this number of places (see § 4.4). 

A group of in sets of equations with the same a t] but m 
different right-hand sides b, k (k = 1 to m) can be solved by 
including m columns b lk instead of a single column b t . 
Only one check column is needed, obtained by adding all 
the elements in the corresponding rows. 

Ex. 4.2. A numerical example is worked on the following 
page. An extra figure is carried in the intermediate cal- 
culation, but the roots are rounded to four decimal places. 

The positions of numbers correspond exactly with those 
given earlier except for the occurrence of only a single 
number in the right-hand check columns. Consider, for 
example, t 3 and T 3 in (4.6). In the numerical calculation, 
instead of writing down the two numbers separately, we 
first of all write down f 3 using formula (4.12) which gives 








oo «*< 2 £ £! 
— .^ r- 00 — 

Cl fS Ci.C-2 


O t r~ oo o\ 
CI CS (S — o\ 

— o 

O ft 

<N p~ ci r- on 

Cl in <N ^- SO 


>t rp rp ri <t 


<n r- >* 
r-i in op 

ci ts 


in r~ ci in — 
6 6 6 6 <n 

o ,—* 

-^ !C -~ -~ ■* 

r-- ff» <n n»j 

O ■*. Os ci on 

in — in I-- ci 

NO- so r— 

so <N ci O O 

6 6 6 6 6 

Tt rr r- 



I I 

r~ on r— 


O O © 


© © o 

6 6 6 


O — r- 
O O ■* 
oo so r-~ 
N O — 
6 6 6 



m ci 

ci ci 

S— T 
. O 
o 6 


ON On 

(N in 


6 6 

2; ® 

ci in 


6 6 

r- _ 

so sO 

in on 

© on 

6 M, 


on fs 

00 i~- 

8 S 

o so 







0-201 49. Wc then calculate T 3 using formula (4.13) and 
this gives 0-201 50. The difference between « 3 and J 3 is 
permissible. For checking the next stage of the calculation 
it is necessary to use the exact row sum T 3 , not / 3 (why not?), 
so we amend the recorded value of f 3 by crossing out the 
last two digits, and replacing them by 50. 

In this example, since the square array of coefficients a u 
is symmetrical about the diagonal line containing a xu a llt 
a 33' O44, each of the square arrays derived from these 
coefficients is symmetrical, apart from small differences due 
to rounding-off errors. This can be used as a check, or 
only half the non-diagonal terms need be computed. 

§ 4.3. Choice of pivots and scaling. In our description 
of the method of successive elimination in the last section 
the element in the top left-hand corner of each array plays a 
very special part. It is called the pivot. The / /; are obtained 
by dividing each of the other elements in the same column 
as the pivot by the pivot. It is obvious that in any given 
array any non-zero element can be used as the pivot. 
Suppose that in the a tJ array wc choose a„ as the pivot. 
The rth row, which contains the pivot, is set aside. We 
calculate, from the elements in the sth column 

As - cja rs , (/ = 1 to 4, omitting I = r). 


O ci >n (N 



>— ■ 

Tf NfOsOsO 


in in t ci 



6 6 6 6 



r- 5 'fr 

O O 1- 

in ci 


so r- 





SO O t- 

— rs 



Os oo so 

fS so 



O O 



where / = 1 to 4, j = 1 to 4 omitting 1 = r,j = s, i.e. the 
row and column containing the pivot. The routine is 
exactly as before. 

In the elimination procedure of §4.2, equations (4.1) 
with n = 4 were reduced to the form (4.16). It is easy to 
see that the determinant of coefficients has not been altered 
since one determinant is obtained from the other by adding 


multiples of certain rows to other rows: 

A = 











a 22 

«2 3 


/> 22 


a 3i 





o 41 

fl 42 

04 3 




f>2 2^33044- 



This is an important result, that the determinant of the 
coefficients of the original equations is given by the product 
of the pivots. It can be proved in the same way that this 
result holds however the pivots are chosen, apart possibly 
from the sign of the determinant. 

It is sometimes stated without qualification that the 
largest element in an array should always be chosen as the 
pivot but the importance of this can be over-rated. Before 
discussing this we consider an example. 

Ex. 4.3. Soke the equations in Ex. 4.2 by the method of 
successive elimination, using the largest element in each 
array as the pivot. 

If we assume that the coefficients of the equations in 
Exs. 4.2, 4.3 are exact and solve the equations using a large 
number of decimal places in the intermediate calculations 
we find that the exact roots are 

x, = -5-3868, .v, = 2-8137, .v 3 = 11-5729, .v 4 = -6-3653, 


rounded to four decimals. The differences between this exact 
solution and the results in Exs. 4.2, 4.3 are given in Table 4. 1 
which shows that, in this particular example, the results 
obtained by taking the largest pivot at each stage arc less ac- 
curate than the results obtained by choosing the top left-hand 
pivot. The reason for this is the smallness of the final pivot in 
Ex. 4.3, namely 0002 51, which is considerably smaller 










































6 6 















o t- 

























o © 



© © © © 











o •» t 







— 1 










■^- rr, 





o o o o 



O © O 


fi r- rs vi 


•/I r* ci «i 

o o o o 


192 13 
195 02 
012 74 

Ov r- 
o O 

O vo 

o o 

o o o o 

© o © 

o o 


r— — oo 

m vo 
VO <N 


3 vO VD 

O O 
OO fS 








o o 




lhan 0008 93, the smallest pivot in Ex. 4.2. The size of the 
smallest pivot in Ex. 4.3 limits the accuracy of the roots to 
at most 1 in 500, in general. The reason for the very small 
final pivot in Ex. 4.3 is clear from (4.17) which states that 
the product of the pivots must be a constant. If we insist 
on choosing as pivot the largest number available at each 
stage this tends to lead to smaller pivots than might other- 
wise be obtained. The logical conclusion of this argument 
might seem to be that the pivots should be chosen so that 
they are all nearly equal to each other, and therefore each 

Tablb 4.1 

Comparison of solutions obtained by the method of 
successive elimination 

*f«(*l) »-W cle. 

Top iefl pivots (Ex. 4.2) -00160 00087 00161 -00081 

Largest pivots (Ex. 4.3) -00402 00213 00405 -00198 

Top left pivots, afler -00098 00055 00099 -00052 


Largest pivots, afler 00148 -00077 -00149 00071 


pivot should be roughly equal to the /rth root of the deter- 
minant. Apart from the fact that this is impractical since 
the determinant is not known at the beginning of the 
analysis it is not true that increasing the size of the smallest 
pivot automatically increases the accuracy. It will be 
shown in the next section that a certain inherent inaccuracy 
is present in the roots of a set of simultaneous linear equa- 
tions if these are found by any procedure involving rounding- 
off errors. As long as the pivots are large enough so that 
round-off in the smallest pivot does not imply a relative 
error greater than the inherent error in the calculation, it 
would seem that little additional accuracy can be obtained 
by choosing the pivots in any special way. The essential 
point is to avoid very small pivots rather than to arrange 




that the smallest pivot is as large as possible. On the whole 
the size of the pivots would seem to be of second-order 

In §4.5 below, when programming the method of 
successive elimination for automatic computers, we shall 
pivot on the largest element in the left-hand column of 
numbers in the array obtained after each elimination. This 
avoids unnecessarily small pivots, and the multipliers are 
all less than unity which may be convenient for some 

The object of illustrating that the choice of pivots in 
Ex. 4.2 gives more accurate results lhan the choice in Ex. 4.3 
is not of course to imply that the choice of the top left-hand 
number as pivot is to be preferred to the choice of the 
largest pivot at each stage. Many of the equations which 
occur in practice are such that, when calculating by hand, 
they are suitable for solution by choosing the top left-hand 
number as pivot even though this is not the largest possible 
pivot. On the other hand if the top left-hand number is 
appreciably smaller than the other numbers in an array 
then it should certainly not be chosen as pivot. 

Suppose next that we rescale the equations of Exs. 4.2, 
4.3 by multiplying the third equation by two and the fourth 
equation by four and introduce new unknowns 

x\ = .v,, x' 2 = x a , x'i = i* 3 , *; = ].v 4 . 

It will be found that if the pivots are chosen as the top 
left-hand numbers in each array as in Ex. 4.2 the pivots are 

0-5400, 0-28009, 0093 96, 0142 90. 

If the pivots are chosen to be the largest numbers at each 
stage as in Ex. 4.3 then the pivots are 

70032, 0-744 65, 0155 67, 0-002 49. 
The differences between the exact results and the results 




given by the two methods are given in Table 4. 1 . It is seen 
that although scaling has improved the results slightly the 
improvement is not striking. The smallest pivot when 
using the top left-hand pivots is now comparatively large, 
but the smallest pivot obtained by taking the largest pivot 
at each stage is almost exactly the same as in Ex. 4.3 before 

The point about rescaling is exactly the same as the 
point previously made about the choice of pivot. Rescaling 
can be advantageous in avoiding very small pivots rather 
than arranging that the smallest pivot should be as large 
as possible. Once the smallest pivot is above a certain 
minimum value there is little point in further rescaling. 

In practice if it turns out that the smallest pivot is 
undesirably small it is usually preferable to improve the 
solution by the method described in the next section 
(Ex. 4.4) rather than repeat the solution with a different 
choice of pivots, rescaled equations, or more decimal places. 

§ 4.4. Inherent error and ill-conditioned equations. This 
section is concerned with a characteristic difficulty which 
occurs when solving linear equations. We first consider a 
concrete example: 

1-985.Y, -l-358.v 2 = 2-212, (4.19) 

0-953.y,-0-652.y 2 = 1062. 

Elimination of .v, gives 

(1 -985 x 0-652-0-953 x 1 -358).v, 

= - (1-985 x 1 -062-0-953 x 2-212). (4-20) 

On working to six decimal places we find 

0-000 046.v 2 = -0000 034, x a = -0-739.... 

Similarly .v, = 0-609.... In quoting the answers to three 
decimal places we have in fact assumed that the coefficients 




in (4.19) are correct to seven decimal places and that we 
have worked to seven decimal places so that x 2 = —340/460, 
x t = 280/460. If after evaluating the quantities in brackets 
in (4.20) we had to round off to five decimal places we 
have .v 2 = -0000 03/0000 05 = -0-6..., and the answer 
is not even correct to one decimal place. Round-off 
imposes a fundamental limitation on the accuracy with 
which the roots can be obtained. 
If instead of (4.19) we solve 

1-985*, -1-358j: 3 = 2-212, 

0-953.v,-0-651v 2 = 1061, 

we find .v, = 3001..., x 2 = 42-41.... An alteration of 
one coefficient of (4.19) by one in a thousand (1062 
changed to 1-061) has altered the roots by factors of 
about fifty to one (0-61 to 30, and -0-74 to 42). 

The reason for the behaviour of the above equation is 
clear. The second equation is nearly a multiple of the first 
so that we can write the equations algebraically as 

ax l +bx 2 = c, 
(ka+a)x l +(kb+P)x 2 = kc+y. 

where A: is a suitable constant and a, p, y are small, 
roots of these equations are 


*i - 

cp—by ca—ay 
x 2 = 



The small numbers a, p, y have a dominant effect on the 
result. In computing determinants, large numbers have 
cancelled by subtraction to give small final results. The 
determinant of coefficients is small relative to the individual 
terms in the expansion of the determinant. (Notice that it is 
meaningless to say that the determinant is small — it has to 
be small relative to some other quantity.) 







If X, are approximate roots of the set of linear equations 
(4. 1 ) we define the residuals r, as 

r t = Z a ij X J~ b i- 
J" i 


For equations (4.19), If we take X, = 0-638, X 2 = -0-696 
we find 

r, = -0-0004, r 2 = -00002. 

Roots which differ from the true roots by 1 in 20 produce 
residuals which arc less than the round-olT error of the 
coefficients. This illustrates the important point that 
smallness of the residuals does not necessarily mean that the 
corresponding estimates of the roots are accurate. 

Equations like (4.19) are said to be ill-conditioned. The 
opposite of ill-conditioned is well-conditioned. We talk of 
the " condition " or " conditioning " of a set of equations. 

As already illustrated, characteristic signs of ill- 
conditioning are: 

(a) To determine the roots to a specified number of 

decimal places when round-off error is present it 
is necessary to work to a much larger number of 
decimal places in the intermediate calculations. 

(b) Small alterations in the coefficients produce large 

variations in the roots. 

(c) The determinant of the coefficients is small in the 

sense described above. 

(d) Estimates of the roots which differ appreciably from 

the true roots produce very small residuals. 

Two other indications of ill-conditioning depend on 
ideas which will be introduced later. In terms of the inverse 
matrix defined in § 5.3, equations are ill-conditioned if the 
elements of the inverse matrix are large in the sense described 
at the end of Ex. 5.6. In terms of the concept of an eigen- 
value defined in Chapter VI, equations are ill-conditioned 

if the ratio of the largest to smallest eigenvalue of the matrix 
of coefficients is large. 

The equations in Ex. 4.2 arc slightly ill-conditioned (see 
also Ex. 5.6). 

If the diagonal elements a,,, a 22 , ••• o m are much 
greater than the non-diagonal elements a l} (i jkf) then the 
equations are well-conditioned. If all a (j (/' /- j) are zero 
then the equations are perfectly conditioned. If the 
coefficients vary in size it is usually convenient to arrange, 
by interchange of equations and/or scaling, that the largest 
coefficients lie in or near the diagonal positions a, , , a 22 , . . . a„„. 

It is useful to distinguish between two different aspects 
of accuracy when solving linear equations. 

(i) If the coefficients of the equations are assumed to 
be exact but a fixed number of decimal places are 
carried during the solution, how many decimal 
places are accurate in the answer? 
(ii) If the coefficients of the equation are inexact (for 
example because they arc obtained by rounding 
off more accurate numbers) to how many places 
are we justified in quoting the final answer? 

It is easy to answer both of these questions in any 
specific case, provided that the detailed solution of the 
original equations is available and we are prepared to do a 
little extra work. Suppose that in case (i) the roots obtained 
by working to a fixed number of decimal places are denoted 
by Xj and we define residuals by (4.21) as before. Suppose 
that the exact roots Xj are related to the approximate roots 

Xj = Xj+S.Xj, (j = 1 to »). 

On substituting in (4.1) we readily find that the 5xj are the 
roots of the equations 

t arfxj = h,- t "ijXj = -r„ (I = 1 to n). (4.22) 

7 = I 7=1 


The procedure is illustrated in the following example. 
Ex. 4.4. Find the accuracy of the roots obtained in Ex. 4.2. 
The residuals are found to be: 

10 5 r, = 0010, 10 s r 2 = -5072, 
10 5 r 3 = -2-512, 10 5 r 4 = -2-506. 

The major part of the work involved in solving (4.22) has 
already been carried out in the table in Ex. 4.2 and it is 
necessary to calculate only one new 6, column. The results 
are almost exactly those given in Table 4.1 : 

<5x, = 00160, fa, = 00087, 

8x 3 = 00161, fa, m -00080 5 . 

Hence the roots in Ex. 4.2 are accurate to within one or two 
digits in the second decimal place. 

In order to answer the question in (ii) above suppose 
that possible errors in a u , A, are denoted by da,,, db h and 
suppose that these alter the roots from Xj to Xj + fay. Then 

I (flu+Sorfixj+tej) - h i + Sb t- 

j = » 

On neglecting second-order terms we find 

n n 

V flyfa. = 5bi- £ f>a,jXj, (i = 1 to nr). 

If \j = Xj+Ej, where the X } are approximate roots, we 
obtain, again neglecting second-order terms, 

£ auSx, = sb t - t 8°u x j = »i. sa y- ( 4 - 23 ) 

;= i ;■= i 

In many cases we can regard a solution as satisfactory 
if the error in the solution is comparable with the inherent 
error due to inexact coefficients. From (4.22) and (4.23) 




it is seen that a solution is satisfactory in this sense if the 
residuals | r, | are less than some average value (such as the 
root-mcan-square value) of the magnitude of the quantities 
Pi deduced from the possible errors in coefficients. As an 
example consider the results in Ex. 4.2. The maximum 
residua] has magnitude 5.10" 5 (Ex. 4.4). The possible 
errors in coefficients can produce a maximum error of 






1+ I 

Statistically the error will be considerably less than this but 
it is unlikely that it will be reduced by the factor of 25 
required to make the average | v, \ comparable with the 
largest residual. Hence the solution obtained in Ex. 4.2 
is satisfactory in the sense defined at the beginning of this 
paragraph. This type of result would seem to be the rule 
rather than the exception when the method of successive 
elimination is used, if the equations are scaled properly and 
one or two guarding figures are kept in the body of the 
calculation. The following example illustrates an empirical 
approach to the problem. 

Ex. 4.5. Alter the coefficients in the equations in Ex. 4.2 
in the following way. Write the ten numbers — 4( I )5 
on slips of paper and for each a,j and b t pick one of the 
ten slips at random. Alter the corresponding coefficient 
by \0~ i times the number on the slip unless this number 
is 5, in which case alter the number by 5.10" 3 if the 
end-digit is even and -5.10" 5 // the end-digit is odd. 
Find the change produced in the roots (4. 1 8). 

An experiment of this type gave 

10 4 u, = -5-23, I0 4 i> 2 = 6-98, 
10*b 3 = -4-60, I0 4 t> 4 = 4-46. 
Sx, = -00852, 8x 2 = 00447, 
«5x 3 = 00814, 8x 4 = 00398. 




The errors in the solution in Ex. 4.2 (see Ex. 4.4) are about 
one-fifth of these. 

We have now gone as far as we can, within the scope and 
limitations of this book, in considering the errors involved 
in solving simultaneous linear equations. An interesting 
theoretical treatment from a comparatively elementary 
point of view is given in Modern Computing Methods, 
H.M. Stationery Office, 2nd Edn. (1961), Chapter 5. 

§ 4.5. A computer program for the method of successive 
elimination.-':- Program 4.1 is a computer program for 
solving an n x n set of simultaneous linear equations by the 
method of § 4.2. The unknowns are eliminated in the order 
x lt x 2 , •■■ x„. The pivot at each stage is chosen as the largest 
element in the first column of the appropriate array. When 
the statement labelled 12 is reached, this largest element 
has been determined and the appropriate rows have been 
interchanged so that the row containing the largest pivot 
is now the first row in the array from which the next 
unknown is to be eliminated. When statements 16 and 17 
are completed the appropriate unknowns have been 
eliminated and the machine proceeds to eliminate the next 
unknown. The remainder of the program is concerned 
with back-substitution, computation of the determinant, 
and printing out the results. (It is advisable to print out 
some measure of the condition of the equations. The size 
of the determinant and the smallest pivot give some indica- 
tion of whether the equations are badly conditioned. In 
order to save unnecessary printing it may be arranged that 
these are printed only if they fall below certain limits.) 
These hints should enable the reader to work through the 
program in detail, as an exercise. This will be more in- 
structive than pages of further explanation. 

t Sec also Modern Computing Methods, H.M. Stationery Office, 
2nd Edition (1961), Chapter 2. 




Program 4.1 
••Abel Statement 

Read n, a u for /= 1(1) « and /- 1(1) n, 

b i hTi= 1(1) „ 
Print n, a tJ for /= 1(1) n and /- 1(1)/;, 

b, for i = 1(1) «, 
Do up to 3 for i = 1(1) « 

3 a i.n+i = b t 

Do up to 17 for r = 1(1) n-] 
q = r 

4 p = q 

5 If q = n go to 10 

<7 = <7+l 

6 If a pr <a v go to 4 
Go to 5 

10 Do up to 12 for y = r(l)/i + l 

c l = a n 

2 a Pl = Cj 

16 Do up to 17 for/ = r + 1(1) nand 

j = r+l(\)n + l 

17 a n = a IJ -(a lr a rJ )/a„ 

X n = «,,.tl/«,,, 

Do up to 22 for r = /j-l(-l)l 

Doupto20fory = r + l(l)// 
20 d = d+a rj xj 

22 *, = («,.„ + . -</)/*„ 

D = a lt 

Do up to 25 for i = 2(1) // 
25 D = Da„ 

Print Xj for; = 1(1) n, a n for i = 1(1) n, D. 





Examples IV 

(Examples for practice in numerical work can be 
derived from those in the text by rescaling the equations in 
Ex. 4.2 or by working the examples to a different number of 
decimal places. Some of the Examples V are relevant to 
this chapter.) 

Ex. 4.6. Modify Program 4.1 so as to solve m sets of 
equations with the same set of coefficients a,j but m different 
right-hand sides b tk (k = 1 to in). 

Ex. 4.7. Draw a flow-diagram for Program 4.1. 

Ex. 4.8. Show that the number of multiplications 
involved in solving n simultaneous linear equations in n 
unknowns by the method of successive eliminations (§ 4.2) 
is of the order of J/; 3 . (This is an important result because 
the main labour in solving large sets of equations lies in the 
multiplications, so that the labour increases rapidly with 
the size of the set of equations. The same result holds for 
the compact elimination method of § 5.2 below.) 

Ex. 4.9. Consider the set of equations 

(ijXj _ , + bj.\j + CjXj + , = (I j, (j = 2, 3, . . . n - 1), 
a n x n - i + b n x n = f/„. 
Show that the x, can be determined from the formulae 

0i = -'. 
b \ 

yj = 

b j- a j9j-i 


, *', h^ A r a h-\ = 2,3,... M ), 
»i bj-atfj-! 

X. = K< *j = hj-gjX J+l , (j = B-l, m-2, ... 1). 




If aj<0, bj>0, Cj<0, and aj+bj+Cj^O, show that 
— lg0y<O. (The number of operations involved is 
proportional to n compared with a number of operations 
proportional to n 3 for a general set of equations. The 
method can be justified from first principles but it is 
instructive to consider it from the point of view of the 
matrix decomposition in Ex. 5.11. Equations of the above 
type occur in connection with solution of the heat-conduc- 
tion equation by implicit difference methods as in Ex. 11.2.) 



§5.1. Matrix algebra. Matrices provide a useful 
method for systematising both the theoretical and the 
practical aspects of certain computing procedures, par- 
ticularly in connection with automatic computers. We 
start Chapters V and VI with short summaries of the 
necessary theory,! to provide a guide for the beginner who 
might otherwise be confused by the mass of detail in text- 
books on matrices. 

An m xn matrix A is simply a rectangular array of nm 
numbers, arranged in in rows and // columns. It is said to 
be of order " in by n " or " m xn": 

A = [a,,] = 

"I! "12 

a 2l a 21 




The (/,y')th element a (i represents the element in the /th 
row and they'lh column of the matrix. 

Consider operations involving two matrices A = [a,j] 
and B = [by]. Equality, addition, and subtraction are 
meaningful terms if and only if the matrices have the same 
number of rows and the same number of columns. If this 

t Furlhcr information can be found in A. C. Ailken, Determinants 
and Man ices, Oliver and Boyd (1962). 





is true then: 

A = B if and only if a,] = b t] for all /, _/*. 

A±B**\ft lt ±b tl l 

This last equation means that the sum or difference of two 
matrices with equal numbers of rows and columns is the 
matrix such that any element is the sum or difference of 
the corresponding elements in A and B. 

To multiply a matrix by a scalar, say k, each term of the 
matrix is multiplied by k: 

kA = [ka tJ ]. 

A matrix with only one row is called a row matrix (or a 
row vector) and a matrix with only one column is called a 
column matrix (or a column vector). The scalar product of 
a row matrix and a column matrix, taken in that order, 
is meaningful if and only if the row and column have the 
same number of terms, and then it consists of a single 
number defined as in the following example: 

[X, x 2 ... x n ] 


= xiyi+x 2 yi+-+xj'm 

Two matrices can be multiplied together if and only if 
the number of columns in the first is equal to the number of 
rows in the second. Then the element in the ith row and 
the yth column of the product is the scalar product of the 
fth row of the first matrix with they'th column of the second. 
If A is m x n, B is n x p, and 

AB = C, 


then the rule determining c, k , the elements of the product 


matrix C, is 


c ik = £ "ijbjit, 


and C is m xp. As an example, 

a,, fl, 2 

021 a 22 
L.031 *32 J 

_*21 *22J 

0|1*11+0|2*2I. «H*I2+«U*M 

021*11+022*21. «21*l2+«22*2 2 

L«31*l 1+032*21. 031*12 + fl 32*22J 

The order in which matrices are multiplied together is 
obviously very important. The existence of the product 
AB is quite independent of the existence of BA and even if 
AB and BA both exist they will in general be unequal and 
there is no very useful connection between the two. In the 
product AB the matrix A is said to premultiply B, and B is 
said to postmultiply A. 

Although the commutative law does not hold for 
multiplication {AB ^ BA in general) the distributive and 
associative laws are true, e.g. 

(A + B)C = AC+ BC, A(BC) = (AB)C = ABC. 

It is useful to define certain special types of matrix. 
A matrix with the same number of rows and columns is 
called a square matrix. Associated with any square matrix 
A we have the determinant of the matrix: 

det/f = 




If A and B are square matrices of the same order then 
(Aitken, loc. cit., p. 80) 

det AB = del A . det B. (5.1) 

A diagonal matrix D = [d,j] is a square matrix with all 
the d,j zero except </,,, d 21 , ... d nn which are the elements 




along the principal diagonal, or the diagonal elements. A 
diagonal matrix all of whose diagonal elements are unity 
is called a unit matrix, and it is denoted by /. Clearly 

AI = IA = A, det 7=1. 

In matrix algebra / plays the same role as unity in ordinary 
algebra. The transpose A' of a matrix A is defined by the 
property that if A is an mxn matrix whose (/,y)th element 
is a,j then A' is an nxm matrix whose (**, y)th element is 
a jh i.e. [a,j]' = [fly,]. A symmetrical matrix is a square 
matrix with fl,y = fly,, i.e. A' = A. 

§ 5.2. A compact elimination method for the solution of 
linear equations. We describe a compact elimination method 
for the solution of linear equations which is essentially a 
telescoped version of a successive elimination method (see 
§ 4.2 and Ex. 5.4). It is convenient to work in terms of 
matrices. We consider only 3x3 matrices when describing 
the theory, but the extension to the general case is direct. 
Lower and upper triangular matrices /.. U are defined as 
matrices with zero elements above and below the principal 
diagonal, respectively: 

L = 



. u = 












The set of linear simultaneous equations (4.1), for the 
case n = 3, can be expressed in matrix notation as 

A = 





Ax = h 


, X = 


. * = 











We first of all show that A can be expressed as the product 
of two matrices in the form A = LU where L is a lower 
triangular matrix as defined above and U is an upper 
triangular matrix with units along the principal diagonal 
(i.e. u„ = 1). (The diagonal elements of cither L or U, but 
not of both, can be chosen arbitrarily.) Wc have 











a 22 













/,, / U U, 2 / U H 13 

'21 '2l"l2 + '22 '2l"l3 + '22"23 
.'31 '3l"l2 + '32 '3l"l3 + '32«23 + '33. 

On equating the individual elements in the first and last 
matrices we obtain equations which determine the elements 
of L and U. It is found that the elements of L and U can 
be determined as follows: 

(a) The first column of L: 

In = "ii> '21 - *au 'i\ = «si« 

(b) The first row of U: 

(c) The second column of £.: 

^22 = «22-/2l"l2. fa " «32-/3l"l2- 

(d) The second row of U: 

"23 = (.a 23 -I 2 l"t3)lll2- 

(e) The third column of L: 

'33 = fl33- / 3l''l3-'32''23- 




The importance of doing the computations in the above 
order is that the quantities required to compute the value of 
an clement at any stage of the calculation have already been 
computed at some previous stage. In hand computation it 
is of course important to evaluate sums of products in one 
machine operation, without intermediate recording. 

Assuming that L, t/are known, we can write Ax = b as 

LUx = b. 
We introduce y defined by 

y = Ux. 
Then (5.2) gives 

Ly = *. 

Written out in full, equations (5.3a, b) are: 

'2I.Vl + '22>'2 - l>i 

iiiyi+iiiyi+hsyz = b 3 




Xi+«ia*a+«i3*a = .Vi 

*a+»«3*a ■ y% ■' 

*3 = ^3 

The values of y lt y 2 , y 3 can be computed in succession from 
the first set of equations and then .v,, x 2 , x 3 can be com- 
puted in succession from the second set. If we solve the 
first set of equations in (5.4) for the y t wc sec that 

yi = hflut 

y2 = <6a-/aitt)//aa, (5.5) 

)'3 =(*3-/3!>'l-/32>'2)//33- 

Before considering the organisation of the calculation 
we show that numerical checks can be obtained in the 
following way. We first of all define the row sums 

S 3 = a 2l +a 22 +a 2i +b 2 , 

•*3 " «31 + fl 32 + fl 33 + '»3- 




A little reflection shows that if the b, on the right of the 
first set of equations in (5.4) are replaced by s it then the y, 
on the right of the second set are replaced by the quantities 




I, = l+u l2 + u l3 +y„ 

S 2 = l+H 2 , + )' 2 , 

I 3 = l+y 3 . 


The same quantities are obtained if we follow the procedure 
used to derive the y t : 

"i ■ (ft— ki ff dlhn 

o 3 *>iH-hiPt-h&iMhs> 


Algebraically the a, and I, represent identical quantities 
but in numerical work they are computed in different ways 
so that the numerical values may differ slightly in the last 
significant figure due to round-off error. The approximate 
equality of the a, and S, serves as an independent check on 
the calculation. 

We now describe a practical procedure for carrying out 
the above method. f The symbols have been defined above 
except for the column check-sums (cf. §4.2): 

S, = a n + a 2l +a 3i (/ = 1, 2, 3), ,9 4 - ftffifc+fc, 

S m S t + S 2 + S 3 + S A , s = s t +s 2 + s 3 . 

It is convenient to divide the calculation into three stages. 

t P. D. Crout, Trans. Amer. Inst. Electr. Engrs. 60 (194!), 1235-40. 
Another layout is given in Modern Computing Methods, H.M. 
Stationery Office, 2nd Edition (1961), Chapter 1. Which of these 
layouts is recommended depends on personal preference. 

(i) Write down the original matrix with check sums: 

a w fl t2 a 13 fc, faj) 
a 2l a 22 a 23 b 2 (s 2 ) 

"31 <»32 "33 ''3 fo) 

(S.) (S,) (S 3 ) (S 4 ) (S = s). 

(ii) Write down an auxiliary matrix. The method of 
deriving this matrix is given below. It may be 
noted at this point that the %, y h and a, are all 
determined by following exactly the same rule 
(cf. (b), (d), (5.5), and (5.6b) above, and the rule 
given in the paragraph preceding Ex. 5.1 below). 





(oi * S.) 



"2 3 


(o 2 * S 2 ) 

f M 



y 3 

(ff, * Sa). 

(iii) Obtain the unknowns by back-substitution in the 
second set of equations in (5.4) and apply the 
final check: 

S 4 = S t Xi + S 2 x 2 + S 3 x 3 WS 4 . 

The elements of the auxiliary matrix are written down in 
the following order: 

(a) Elements of the first column, /„. 

(b) Elements of the first row to the right of the diagonal, 

M iy. >"ii ff i- Then apply a check : 

I, = l+Uu+Uu+y^ffy. 

(c) Elements of the second column on and below the 

diagonal, l l2 . 

(d) Elements of the second row to the right of the 

diagonal, u 2J , y 2 , a 2 . (Notice that S, not o~, is 




used to compute <r 2 (cf. Ex. 4.2).) Then apply 
the check: 

£ 2 = 1+ "23 +J>a Ka i- 

(e) Elements of the third column on and below the 

diagonal, / 33 . 

(f) Elements of the third row to the right of the diagonal, 

y 3 , ff 3 . (Note that I,, I 2 not a,, a 2 are used to 
compute <r 3 .) Then apply the check: 

Z 3 = l+>"3 ft ff3- 

The elements of the auxiliary matrix are determined by 
the following general rule: Each element on or below the 
diagonal is equal to the corresponding element in the 
original matrix minus the sum of products of elements in 
its row with corresponding elements in its column, in the 
auxiliary matrix, where these products involve only pre- 
viously computed elements. Each element to the right of 
the diagonal is obtained by exactly the same rule except 
that finally we divide by the diagonal term in the same row. 
This last part of the rule applies to the My, ;'„ and a, but it 
does not of course apply to the checks I, which are defined 
as in (5.6a). If the matrix A is symmetrical, labour can be 
saved by noting that each element My of U is equal to the 
symmetrically situated element l J{ of L divided by the 
diagonal element /,,-. 

Ex. 5.1. Apply the above method to the equations: 

.Y,+4.Y 2 + x 3 + 3x 4 - 1. 

- .x 2 + 3.v 3 + .v 4 = - 4, 

3.v,+ .y 2 + 6.y 3 -10.v 4 = -11, 

-Y,-2.Y 2 + 5.Y 3 = 1. 




Show that the auxiliary matrix, including check columns, 

1 4 1 3 1 (10) 

-1 -3 -1 4 (1) 

3 -11 -30 1 -1 (1) 

1-6-14 5 2 (3), 

and deduce the solution 

x, = 10, x 2 = -3, Xj = -3, x 4 = 2. 

Ex. 5.2. Solve the equations of Ex. 4.2 by the above method 
The auxiliary matrix is (omitting the check column) 

0-5400 0-969 07 0-807 04 0-670 74 2-410 19 

0-5233 0-28009 —0-214 66 0-623 73 -3-640 80 

0-4358 -0060 12 0023 49 1-808 86 0057 65 

0-3622 0174 70 004249 0008 93 -6-357 51 

Roots: .v, = -5-3717, .Yi-2-8055, * 3 =ll-5575, .v 4 - -6-3575. 
Errors: Sxi=— 0-0151. 8.v 2 =0-0082, 8*3= 00154, Sx 4 = -00078. 

It is seen that the errors arc almost exactly the same as those 
obtained by the method of successive elimination in § 4.2 
(see Table 4.1). Theoretically we should expect the compact 
method to be slightly more accurate than the method of 
successive elimination since there arc fewer round -off errors. 

Ex. 5.3. Show that the compact method gives the following 
results for scaled equations in Ex. 4.3: 

Roots: .n=— 5-3715, * 2 = 2-8054, a- 3 = 11-5575, .v 4 - -6-3574. 
Errors: 8ai=— 00153, S.v 2 0-0083, S.v 3 -=0-0I54, 8x4= -00079. 

Ex. 5.4. Derive the compact method corresponding to the 
method of successive elimination o/§ 4.2, in the following 
way, from equations (4.4)-(4.10). 
We have e 4 = </ 4 -/ 43 f/ 3 . In this expression substitute 




8 5.3 



for d A in terms of c A , and then for r 4 in terms of b 4 to find 

e t = b A -U x b i -U 1 c 2 -l 4i d i . 

Similarly show that all the terms in (4.5), (4.8), (4.10) can 
be expressed in terms of quantities which occur in these 
equations, the original matrix, and the l l} . Show that a 
similar result holds for the l- tJ . Hence it is unnecessary to 
record the quantities in the square brackets in (4.6), (4.7), 
(4.9). Show that the resulting formulae are in fact exactly 
those involved in solving Ax = b by decomposing A in the 
form A = LU, where L has units along the principal 

In the compact method described above we have 
arranged that it is U which has units along the diagonal 
instead of L. Show that this is a condensed form of the 
method of successive elimination in which, for instance, 
we divide the row of quantities a ti , b lt s { in (4.4) by a lt 
instead of dividing the column a n by «,, and similarly for 
the other arrays. 

There are various methods of choosing the diagonal 
elements in L and U, and arranging the methods of successive 
elimination and compact elimination. The precise variant 
which is chosen is largely a matter of personal preference. 

§ 5.3. The inverse matrix. Suppose that, given an n x n 
matrix A, there exists a matrix Z such that 

AZ = I. 

Then it is possible to show that ZA = I. The matrix Z 
is called the inverse of A and it is usually denoted hy^~*. 
When A' 1 exists we have 

A~ l A = AA~ l = /. 

From this relation, using (5.1), we have 
det A. det A~ l = 1. 


Hence a necessary condition for the existence of the inverse 
of A is that det A j= 0. If this is true then the matrix A is 
said to be non-singular, but if det A = the matrix A is 
said to be singular and in this case the inverse does not 

Instead of dividing by matrices we multiply by the 
inverse matrix. As an example consider the simultaneous 
linear equations (4.1) which in matrix notation are 

Ax = b. 


We assume that the determinant of the coefficients is 
non-zero so that a unique solution of the equations exists, 
and the inverse matrix also exists. To solve (5.9) we wish 
to remove A from the left-hand side. Premultiply both 
sides by A~': 

A~ l Ax = A-'b. 

On using (5.8) we have 

x = A~'b. 


One practical procedure for the determination of the 
inverse matrix is obvious from its definition. Denote the 
yth column of A' 1 by z } and let dj denote the nx 1 column 
matrix whose elements are zero except for the jlh clement 
which is unity. Then AA~ l = /gives 

Azj = dj, (J= 1,2, ...«)• 

These are just n sets of simultaneous linear equations with n 
different right-hand sides of simple form, but the same 
coefficient matrix on the left. They can be solved by 
extending any method for solution of a single set of equa- 
tions. As an example consider the extension of the compact 
method of § 5.2. The " original matrix " will appear as 







follows : 


a i2 




a 2l 











(Si) (S 2 ) (S 3 ) (10) (10) (10) (S - s). 

There is still only one check column, obtained by adding 
all the elements in the corresponding rows. The " auxiliary 
matrix " will be of the form 



"12 "13 

'22 "23 




m 22 



(a, m I,) 

(<7 2 « Z 2 ) 
'31 '32 '33 '"31 '"32 '"33 (C3 * 2 3 ). 

In the position occupied by the unit matrix I'm the original 
matrix there is a lower triangular matrix with elements m tj . 
The calculation can be completed by forming a back- 
substitution matrix of the following form: 

1 »12 "13 A \\ A \2 *13 

1 u 23 A 2i A 22 A 23 

I Am A 32 A 33 . 

The A,j are the required elements of A''. Checks are 
given by 

S i A, J +S 2 A 2J +S 3 A 3J =1, 0=1, 2, 3). 

The square matrix on the left of the back-substitution 
matrix is simply the upper triangular matrix V. The 
elements A u are obtained in the order: last row from left 
to right, second last row from left to right, and so on. 
The last row is the same as the corresponding part of the 
last row in the auxiliary matrix. The second last row is 
given by the formula 

A 2J = m 2j -u 23 A 3J , (J = 1, 2, 3), 

where of course m 23 is taken to be zero. The general rule 
for the element A,j is: Any clement is given by the cor- 
responding clement in the auxiliary matrix less the sum of 
products of previously computed elements in its column 
and corresponding elements in its row in the U matrix. 
The /th element in its column is taken with the ;th element 
in the row of the U matrix. 

The method extends directly to the general nx.n case. 
It is seen that the amount of labour involved in finding an 
inverse matrix is about three times the work involved in 
solving a single set of equations, independent of the value 

Ex. 5.5. Find A' 1 for the matrix A of the equal ions in 
Ex. 4.2. 

Part of the auxiliary matrix has already been given in 
Ex. 5.2. 

A' l = 

















Ex. 5.6. Solve the equations in Ex. 4.2 by the formula 
x = A~'h, using the inverse matrix found in Ex. 5.5. 

Roots: -vi -5-3822, a- 2 = 2-8157, A- 3 =lI-5627,^=- -6-3607. 
Errors: &vi= -00046, Sx 2 - -00020, S.v 3 =00102, S.v<= -00046. 

Note that in this example the individual terms of A~ l 
are large compared with the final roots. This indicates that 
the equations are ill-conditioned since in the sum of products 
required to form the roots large terms cancel to give small 
end-results. On the other hand the degree of ill-conditioning 
is not excessive. 

We make the following comments in connection with 
the discussion of pivots and scaling in §4.3. The elements of 







A~ l in Ex. 5.5 are comparable with each other in size. 
Since we arc using fixed-point working and the elements of 
A are comparable this means that the errors of the elements 
of A' 1 should be comparable. If there is considerable 
variation in the sizes of the largest elements in the rows 
(or columns) of an inverse we have to be careful that we are 
not losing accuracy unnecessarily in the sense that the 
accuracy of the inverse could be improved by scaling the 
original matrix. In the method described above for in- 
verting a matrix we can obtain various choices of pivots by 
rearranging the rows and columns of the original matrix. 
Whatever choice of pivots is used the inverse of the last is 
always an element of the inverse matrix. If the last pivot 
is small then at least one clement of the inverse matrix is 
large, and the equations are to this extent ill-conditioned. 
This is one way of seeing that the size of the last pivot has 
a second-order effect on accuracy. 

The condition of the equations Ax = b depends on the 
matrix A and not on b. The equations are ill-conditioned 
if, when we check that A' 1 A is equal to /, then terms large 
compared with unity have cancelled to give (approximately) 
units along the principal diagonal and zeros elsewhere. 

If B is a matrix such that BAsal then we say that B 
is an approximate left-inverse of A, and similarly if ACxl 
the matrix C is an approximate right-inverse of A. Due to 
ill-conditioning, even if ACxI the product CA may not be 
a good approximation to /. As an example consider 

A = 


ri-oo looi, c = r-89 ioo~i. 

|_100 0-99J L 90 - 100 J 

= no oo~i, cA = r n 101. 

|_oi ioJ |_- 10 ~ 9 J 

A good right-inverse may not be a good left-inverse, and 
conversely. The situation in numerical work is quite 

different from the situation in the algebraic theory where 
the left- and right-inverses arc the same. 

We conclude this chapter by expressing in terms of 
matrices the discussion at the end of §4.4 concerning 
accuracy when solving linear equations. We define a 
column matrix of residues, r, by the equation 

r = AX-b, 

where A' is an approximate set of roots. If the exact roots 
x are related to the approximate roots by the equation 

x = X+ 8x, 

where 5x is a column matrix of errors, we readily find 

A8x = -r, i.e. 5x = -A' l r. (5.12) 

This is the matrix form of (4.22). 

If errors in the coefficients are given by 

8A = [Sa.j], 5b = [5b,], 

and the corresponding new roots are denoted by x + 5x, 
we have 

(A + SA)(x+8x) = b + 8b. 

On neglecting the second-order term SASx we find 

8x = A- , (Sb-5Ax). (5.13) 

This is the matrix form of (4.23). 

When using (5.12) it is essential to evaluate the residuals 
r to the number of significant figures required in Sx, and 
since the residuals are small this may mean using a large 
number of decimal places when calculating the residuals. 
On the other hand A~ ' need only be known to the number 
of significant figures required in 8x, which may mean that 
only an approximate determination of A~ x is required. If 
an approximate value of A' 1 is known then the roots of 
Ax = b can be evaluated as accurately as required by the 





x — \" )approi "> 

*«+i = x,-(A- l ) tnm r„ (s = 0, 1, 2, ...), (5.14) 

where x s is the sth approximation to x. The residuals must 
be evaluated with the appropriate accuracy at each stage. 
This is an illustration of the principle that in an iterative 
procedure it is often possible to use approximations in 
certain parts of the computation, provided that the requisite 
accuracy is maintained in certain vital parts of the calcula- 

Ex. 5.7. Check the results of Exs. 4.4, 4.5 by using the 
inverse matrix in Ex. 5.5 in conjunction with (5.12), 

When working with simultaneous linear equations it 
often happens, as in Ex. 5.7 compared with Exs. 4.4, 4.5, 
that we can either evaluate z = A~ x d if A~' is available, 
or we can solve Az = d where the major part of the work 
has already been done since another set of equations Ax = b 
has already been solved. To some extent the choice is a 
matter of personal preference. In hand computing it may 
be preferable to work with A' 1 since multiplication by A' 1 
is rather more straightforward than adding an extra column 
to an auxiliary matrix and back-substituting. Also the 
size of the terms in A~ x gives some indication of the 
condition of the matrix. When using a computer it may be 
preferable to solve an extra set of equations, but this of 
course depends on what is stored inside the machine. 

Examples V 
Ex. 5.8. Write a computer program (or a flow- 
diagram) for the solution of linear equations by the compact 
method of § 5.2. 




Ex. 5.9. Write a computer program (or a flow-diagram) 
for the inversion of a matrix using 

(i) the compact method explained in connection with 


(ii) the successive elimination method of § 4.2 (compare 
Program 4.1). 

Ex. 5.10. A compact method which is sometimes 
useful when it is desired to avoid small pivots is the square- 
root method in which we choose /„ = u„ in the decomposi- 
tion A = LU. Work out the details and show that when A 
is symmetrical U is equal to the transpose of L. Apply the 
method to the example in Ex. 4.2 and show that the errors 
are about two-thirds of those by the method of Ex. 4.2 
(Table 4.1). (The labour involved in taking square roots is 
justified only if otherwise the pivots become undesirably 
small in the sense discussed in §4.3.) 

Ex. 5.11. Show that the band or tridiagonal matrix A 
defined by 

a,j = for | i-j \ > 1 

can be decomposed in the form LU where L is a lower 
diagonal matrix with /, ; = for i—j> 1 and U is an upper 

diagonal matrix with « (i = 1 
Find L and U for 

and Ujj = for /— /> 1 . 

r 2 











A = 

Consider the procedure in Ex. 4.9 from the point of view 
of the compact elimination method of § 5.2. 

Ex. 5.12. Find the exact inverse of the matrix A given 
at the end of Ex. 5.11, using the compact method explained 




in connection with (5.11), expressing all non-integral 
numbers as fractions instead of using decimals. 

Ex. 5.13. Show that (AB)' = BA', and {AB)~ i 
= B i A~ 1 if both inverses exist. 

Ex. 5.14. Evaluate det A, the determinant of the 
coefficients a tj , for the example in Ex. 4.2, to four significant 
figures. (The pivots in Ex. 4.2 where five decimals were 
carried in the intermediate calculations give det AkO-3173 
x 10~ 4 . A similar calculation carrying six decimals gives 
det Ax0-3l67x 10 -4 . Keeping one extra decimal place 
does not mean that the error is reduced by exactly one 
decimal place, since round-off errors occur at random. 
But the results indicate that the last result is correct to 
one digit in the fourth significant figure. How could the 
accuracy be checked if you were using a digital computer?) 

Ex. 5.15. Suppose that we arc given in pairs of points 
(xj, yj) and it is required to represent the numbers y t 
approximately by a relation of the form y s «f(xj) where 

Ax) = -,/ 1 (-v) + z 2 / 2 (j:)+... + r n / n (.v), (/,<»,), 

where the/,(.v) are known functions and the z, are constants 
which are to be determined. This procedure is known as 
curve-fitting. We set 


*» = X z ifix,)-y s , (.v = 1 to m), 
i= i 

and determine the z, by the principle of least squares. We 

Partial differentiation with respect to Zj(j = 1 to n) gives n 
equations in /; unknowns which can be written, in matrix 





A' Ax = A'y, 

a = K-] = [f/xdl t = M, y - [>'<]. 

and A is m x «, z is n x 1, y is m x 1 . 

These equations can be badly conditioned, especially if 
the points (pc Jt y t ) tend to group in clusters. In such cases 
it may be advisable to replace a cluster of say p points by a 
single point at the centre of gravity of the cluster, repeated 
p times. 

Fit a quadratic to the following points: 

0-52 1-22 206 3-47 400 4-60 
2-90 1-72 1-20 200 2-61 405 


Ex. 5.16. Show that the inverse of the matrix of 
coefficients of the equations (5.7) is 

"27 -271 -19 1801 

21 67 13 -60 

3 81 9 -30 

12 -26 -14 30 


Ex. 5.17. The effect on the solution of a set of simul- 
taneous linear equations Ax = b of modifying one of the 
coefficients a i} can be analysed in the following way. 
Suppose that we know A~ l = [Ay] and we wish to solve 
(A + D)x = b where D is an nxn matrix with typical 
element d,j such that d iS is zero unless / = r,j = s, i.e. cl ri is 
the only non-zero element of D. Then 

(I+A- , D)x = A-'b. 
If x* is the solution of Ax* = b show that 




Xj = Xj-A Jt d rs x s , (j * s). 




Show that the solution of (5.7) with the second equation 
replaced by 

-x 2 + 2x 3 +x 4 = -4 
is given by 

23.v, = 501, 23.v 2 = -136, 23v 3 m -150, 23v 4 = 72. 



§6.1. Introduction. Many of the applications of 
matrices involve eigenvalues and eigenvectors. To intro- 
duce the ideas which are involved we discuss a simple 
physical example. Further applications occur, for instance, 
in connection with the principal axes of a quadric, and the 
principal moments of inertia of a solid body.f The reader 
who does not find physical examples illuminating can start 
with the mathematical definition (6.3) below. 

Consider three particles, each of mass m, at equal 
distances / along an elastic string of length 41, where the 
string is fixed at both ends. If the particles execute small 
transverse vibrations under no external forces, the equations 
of motion are 




d 2 X x 

dl 2 

d 2 X 2 _ 
dl 2 

d 2 X- 

_ TX t T(X 2 -X X ) 

T(X 2 +X,) T(X 3 - X 2 ) 

dl 2 

1 _ _ 

T(X 3 -X 2 ) _ TX 3 
I I ' 

t See W. H. McCrea, Analytical Geometry of Three Dimensions, 
Oliver and Boyd (1960) and D. E. Rutherford, Classical Mechanics, 
Oliver and Boyd (1960). For mathematical properties or eigenvalues 
and eigenvectors (Latent roots and vectors) sec A. C. Aitken, Deter- 
minants and Matrices, Oliver and Boyd (1962). 





where X u X 2 , X 3 are the displacements of the three 
particles and T is the tension in the string. If all quantities 
vary sinusoidally with time we can set 

X, = x,e"°<, (r = 1, 2, 3). 

Then the above equations become 

(2-;.).v,- x 2 =0, 

-.v,+(2-A).y 2 - .v 3 = 0, (6.1) 

-.v 2 + (2-A).v 3 = 0, 

where A = co 2 ml/T. This is a homogeneous set of simul- 
taneous linear algebraic equations which in general possesses 
only the trivial solution x, = x 2 = .v 3 = 0. However, 
non-zero solutions will exist for special values of A. From 
Cramer's rule (4.2) for the solution of linear equations in 
terms of the ratios of determinants we deduce that a necessary 
condition for the existence of a non-zero solution of (6.1) 
is that the determinant of the coefficients is zero: 

2-A -1 

■1 2-A -1 

o -l 2-;. 

= 0. 


By expanding this determinant we obtain a cubic equation 
in A which can be solved to give the three special values 
of A for which non-zero solutions of (6.1) exist, namely 

A = 2-^2, 2, 2 + V2. 

If in (6.1) we set A = 2—^2 then we find that the resulting 
equations have a solution x, = C, x 2 = Cyj2, x 3 = C, 
where C is any arbitrary constant. The physical meaning 
of this is that corresponding to A = 2— N /2 there is a free 
vibration of angular frequency io given by co 2 = 7(2 — N /2)/w/, 
and corresponding to this frequency of vibration there is a 
mode of oscillation such that the ratios x t : x 2 : x 3 are given 




by 1:^/2:1. Similarly 

for A = 2, x,:x 2 :x 3 = 1:0:— I, 

forA = 2 + 72, x,:x 2 :x 3 = I: -y/2:l. 

This analysis shows that there are three and only three 
frequencies of free vibration. Corresponding to each of 
these frequencies there exists a solution of equations (6.1) 
which gives the corresponding mode of oscillation. The 
modes for this particular case are shown graphically in 
Fig. 6. 1 . The reader should visualise these modes physically. 

We now express these ideas in more precise mathematical 
terminology. If A is an n x n matrix then, generalising (6. 1), 
we consider the equations 

04-A/)x = 0, or Ax = Xx. (6.3) 

From Cramer's rule (4.2) these equations possess only the 
trivial solution x = unless the determinant of the co- 
efficients is zero: 

det(.4-A/) = 

fl,,-A fl l2 
«2I «22-' 1 ••■ 






= 0. (6.4) 

This is a polynomial equation in A. The roots of this 
algebraic equation are special values of A for which the 
simultaneous equations (6.3) possess non-zero solutions. 
They are called the eigenvalues of the matrix A and will be 
denoted by A,- (1 = 1 to n). The Aj are sometimes called 
latent roots, characteristic roots, or proper values. Cor- 
responding to each of the ;., there will be a solution of (6.3) 
of the form Or, where x, is a non-zero vector and C is an 
arbitrary constant. These solutions are called the eigen- 
vectors (or latent vectors, characteristic vectors, proper 
vectors). We have 

Ax x = A,*,. (6.5) 




Wc now deduce some properties of the eigenvalues and 
eigenvectors for the special case where the a ti are real and 


^3 = 2+72 

A, A j A-j 

Fig. 6.1 Modes of vibration of three panicles on a string 

A is symmetrical. We first of all show that then the eigen- 
values and eigenvectors are real. In the general case, 
since the ?., are roots of an algebraic equation, the eigen- 
values may be complex and therefore the elements of x, 




may be complex. Denote by x, a vector whose elements 
are the complex conjugates of those of x,. The transpose 
x J of x, is a row vector whose pth element is the />th element 
of the column vector x,. Multiplication of (6.5) by x\ gives: 

X, = x',4x,/x",x,. 


(We adopt the convention that if y, z are n x 1 column 
vectors with elements ;>,, z t , (;' = 1 to »), then y'z denotes a 
scalar defined by 

y'z - ?iif+JUb+.«.+Jfti 

the inner product of y and z. Strictly speaking y'z is a 1 x 1 
matrix and the inner product should be denoted by some 
special symbol such as x.y or (x, y) but no confusion will 
arise from our usage.) On writing x',x ; and x'lAx-, in full 
it is seen, on using the assumption that A is symmetrical, 
that both of these expressions are real. Hence the X, are 
real. Corresponding to each of the /.,, equations (6.3) will 
possess a solution of the form Cx, where the elements of 
x, are real. From now on we can therefore drop the bars. 
If we multiply (6.5) by x}, 0' ^ /), write down the 
corresponding expression with / and j interchanged, and 
subtract the two expressions, we obtain, since Xj<4x, =x\Ax l 
and x'jx, = x,x„ 

(;.,-;», = 0, (i*j). 

We shall assume that the eigenvalues are distinct, i.e. 
A, / /,. Then 

x.'x ( = 0, (i*j). 


If this equation is satisfied the vectors x ( and x ; are said to 
be orthogonal and the eigenvectors x, form an orthogonal set 
of vectors. Since the elements of an eigenvector are derived 
from homogeneous equations these elements can be multi- 
plied by an arbitrary constant. It is usually convenient to 




adjust the size of the elements by some standard rule. This 
procedure is called normalisation and it can be done in 
various ways. We shall use two kinds of normalisation in 
this chapter: 

(i) The eigenvectors can be normalised so that the 
numerically largest element of each eigenvector is 
(ii) An eigenvector x, with elements d,, d 2 , ... d„ can 
be normalised so that 

x\ Xl = d] + d\ + ... + dl=\. 

In this case we say that the eigenvectors are 
normalised to have unit length. 

If a set of vectors is such that it is impossible to find 
constants a,, not all of which are zero, such that 

a,x,+a 2 x 2 + ...+a fl x ll = 0, 


then the vectors are said to be linearly independent. A 
useful deduction from this definition is that if a relation of 
the form (6.8) holds between linearly independent vectors 
then a, = a, = ... = a„ = 0. Orthogonal vectors are 
linearly independent for, if a relation of the form (6.8) 
exists, we see on multiplying by x', that a, = and this 
holds for all ;'. 

An arbitrary n x 1 vector u can be expressed uniquely as 
a sum of multiples of n orthogonal «x 1 vectors: 

u = a i x x +a 1 x 2 + ... + a n x„ 


where the a, are suitable constants. For if the a, are regarded 
as unknowns then (6.9) is an n x n set of simultaneous linear 
equations for the a,. The determinant of the coefficients 
is non-zero since otherwise the sum of certain multiples of 
the columns of the determinant would be zero, i.e. (6.8) 
would be true, but we have just shown that this is impossible. 




Hence the determinant of the coefficients is non-zero, and 
values for the a, exist and arc unique. In fact we can easily 
write down the solution explicitly since multiplication of 
(6.9) by x\ gives 

a, = x'fl/x'iX,. 

F.x. 6.1. From the results for the problem discussed at the 
beginning of § 6. 1 deduce that the matrix 

A - r 2 - 1 0] 

-1 2-1 (6.10) 

.0-1 2. 

has eigenvalues /, =2-^/2, }. 2 = 2, ;. 3 = 2+^/2, 
with corresponding eigenvectors, normalised to unit 

x, = 

' i 1 

. x 2 = 


. X 3 m 

i 1 



L * . 


L i . 

// Is readily verified that x\Xj = 0, (i # j). 

§ 6.2. An iterative method for finding the largest eigen- 
value. In the last section we obtained the eigenvalues for 
a particular example by direct expansion of the determinant 
(6.2). This will not be a very efficient method for deter- 
minants of large order. In this section we discuss an 
iterative method for finding the largest eigenvalue of a 
general n x n matrix. 

We assume that the matrix is symmetrical and real, 
since this is the case covered by the theory in §6.1. The 
eigenvalues arc then real, and we shall assume that there is 
one eigenvalue of maximum modulus. An extension to the 
case where there are two eigenvalues of the same modulus, 
but opposite signs, is given in Ex. 6.7. The method given 




ill this section can be extended to other cases (but these will 
not be examined in detail): 

(a) If the matrix is real but unsymmetrical, it is necessary 
to introduce biorthogonal functions as in Ex. 6.9. The 
method given in this section applies if there is one eigen- 
value of largest modulus. In this case the eigenvalue of 
largest modulus will automatically be real, since, if the 
matrix is real, complex eigenvalues always occur in con- 
jugate pairs. A method for dealing with complex conjugate 
dominant eigenvalues is given in Ex. 6.8. 

(b) If the matrix has complex elements, it is relatively 
easy to extend the theory in § 6.1 to the case of a Hermitian 
matrix, where the elements are complex, but a }t = a,j. 
The eigenvalues are real, and the method given in this 
section applies if there is one eigenvalue of maximum 

(c) For the general complex matrix the method, given in 
this section applies, provided there is one eigenvalue of 
maximum modulus, though, as in (a), it is necessary to 
introduce biorthogonal functions in the theory. 

Suppose that « is an arbitrary ;jxl column matrix. 
By successive premultiplication by A form the sequence 

"i = Au > "2 = ^«i = ^ 2 «o. 

u p = Au p . 1 = A"u . (6.11) 

(The suffix p in u p is used in a different sense from the 
suffix /' in x,. This distinction should be borne in mind 
throughout this section.) On assuming a representation 
of the form (6.9) for u and using (6.5) we see that 

u, = Au Q = a l X i x l +a 2 X 2 x 2 + ... + a„X n x n , 

u p = A"u = a l X\x[+a 2 X p 2 x 2 + ... + a n X" n x n (6.12) 

">*■+,£ «'(f'Y 





Suppose next that | ;., | > | X 2 | ^ | X 2 | ^...£ | X a |. 
Then (Xi/X,)" tends to zero as p tends to infinity (/>1). 
If u p , x, arc the sth elements of u p , x, for some specific s, 

and we define 
we find 

', = *! 

I P = uJUp-l, 


J = 2 \X, 

1 = 2 \X { 


»^i as p-*co, (6.15) 

where B, = a,x,/a 1 x l . In deriving this result we have 
made two important assumptions: 

(i) There is only one eigenvalue with largest modulus. 

The general case is outside the scope of this 

book (cf. Exs. 6.7, 6.8). 
(ii) The trial vector u is such that a, ^ 0. For rapid 

convergence we should try to choose u so that 

a, is as large as possible relative to the other a,. 

The approximate form of « is often known on 

physical grounds. 

In numerical work, instead of working with the sequence 
u p it is convenient to form the sequences z p , w„, starting from 
an arbitrary vector z = / m> , by means of the equations 

Aw„. t = z p = l p w p , ip = 1,2,3, ...), (6.16) 

where l p is now defined as the largest element of z p , so that 
the largest element of w p is unity. Suppose that the largest 
element of x, is the 5th element and that x, is normalised 
so that this element is unity. A minor difficulty arises in 
the theoretical analysis because for small p the largest 
element of z p (and hence the unit element of w p ) may not 
be the sih element due to the preponderance of com- 
ponents other than x, in the initial trial vector. But we 




know, from assumption (ii) above, that eventually the 
effect of Xy will predominate. Hence without loss of 
generality we can assume that the slh element of z p is always 
the largest element. We write 

where we denote the 5th element of x,(i> 1) by x h and the 
a, arc constants whose precise values are not important. 
From (6. 1 6) we find 

H>, = | 

*.+ Z «.(£Y4/{l+ t +($f*A> M7) 

= 2 


1 = 2 


As before l p tends to X, as /? tends to infinity. Also w p 
tends to x { . 

If convergence is slow it will be desirable to use extra- 
polation. In order to simplify the analysis we make a 
third assumption: 

(iii) X 2 * -X 3 . (6.18) 

(We have already assumed | X% \ e£ I *3 I- This new 
assumption imposes an additional restriction.) On using 
this assumption we have, for sufficiently large p, 


[i+ct 2 x 2 {). 2 ix i y 'J 

where k = cc 2 x 2 (X 2 — ^i)l^2- An improved approximation 
/., for /, can be obtained by setting 

/, = Mi +*(;.,/;.,)"}. (6.19) 

On replacing p by p— 1 and eliminating k we find 

L l -l p = (X 2 /X l )(L l -/,-,). (6.20) 




Hence in the terminology of § 2.4 we are dealing with a 
first-order iterative procedure. Convergence can be 
accelerated by using Ailken's (5 2 -process (2.15). (Compare 
(6.20) and (2.13).) An improved estimate of X t is therefore 
given by 

in - 'p- r 9 v l, "fi ■ (62,) 

'p-^'p-l + 'p-2 

In exactly the same way (6.17) gives 

m> p «x x + k(X 2 IX l y, where k = a 2 {* 2 - WiM 2 )x 2 .v,}. 

Hence convergence of the successive approximate eigen- 
vectors can also be accelerated by using Aitken's <5 2 -process. 
But it must be remembered that the above analysis is not 
valid if X 2 = - ?. 3 , and the extrapolation will not be very 
accurate if X 2 is approximately equal to — X 3 . 

In applying Aitken's formula it is essential that no 
numerical mistakes be made in computing the iterates 
used. For the special case when A is symmetrical a 
numerical check on successive iterates is given by the 
following formula derived from (6.16): 


'p+j - /,»> p . 


From (6.20) it is clear that the rate of convergence of the 
original first-order iterative procedure depends on XJX 2 . 
The larger this ratio the faster the convergence. From 
(6.19) it is seen that 



/ p - 1 -/p- 2 _ f 
1,-1,-1 • 



The value of l p can similarly be estimated from the com- 
ponents of n>p_ 2 , w p . ,, w p . In terms of t p Aitken's formula 
(6.21) is (cf. (2.18b): 



8 6.2 




In hand computation it is often convenient to compute /,, 
mentally after each iteration and extrapolate using Aitkcn's 
formula as soon as successive iterates begin to vary regularly, 
as indicated by agreement between successive values of t p . 
If successive values of t p show no signs of behaving regularly 
as p increases this indicates that some of the basic assump- 
tions (i)-(iii) in the above analysis are wrong (cf. Exs. 6.7, 

Ex. 6.2. Obtain the largest eigenvalue and the corresponding 
eigenvector of the following matrix, working to four 
decimal places: 

A = 


















Iteration with h-' = [1,1, 1, 1] gives 

z* = 2-7386 


, z 7 = 2-7381 


,z 8 = 2-7379 



The numerical check (6.22) gives / 8 h>> 8 = 2-8770 8 and 
l 1 w' 1 w 1 m 2-8770 2 . The following values of f 3 .../ 8 are 
found: -17, -2-2, 12, 11, 30, 2-5. The last two values 
are not very accurate due to round-off error. However, 
the values of t p indicate that successive iterates are beginning 
to behave regularly when p = 8 and this can be confirmed 
by estimating t p from the components of the eigenvectors. 
Since the round-off error is starting to become appreciable 
compared with the differences between successive iterates 
z 6 , z 7 , z 8 it is advisable to apply Aitken's <5 2 -process at this 

point. (If we were working to five or more decimal places 
we might iterate again to confirm the regular behaviour.) 
We obtain 

x, = 

0-0198 + 00012" 


f"0 0210"| 

0-1825 + 0-0016 

01 841 


01 265 



A further two iterations give 
A, = 2-7376, .v, 

01 265 

Further iteration merely repeats these values which therefore 
represent the final estimate of the solution. 

§ 6.3. The determination of subdominant eigenvalues and 
eigenvectors. When the largest or dominant eigenvalue 
and the corresponding eigenvector of an nxn matrix has 
been obtained various methods are available for replacing 
the matrix by an («-l)x(«-l) matrix with eigenvalues 
k 2 ---X n . The largest eigenvalue and corresponding eigen- 
vector of this new matrix can then be determined. The 
procedure can be repeated until all the eigenvalues and 
eigenvectors of the original matrix have been determined. 
We confine our attention to the case of a real symmetric 
matrix with real, distinct, eigenvalues. 

The method described below applies only to symmetric 
matrices. It depends essentially on the orthogonality 
property of eigenvectors given by equation (6.7). The 
method is described sufficiently clearly by an example, and 
we give no general discussion. Other methods for removing 
known eigenvalues and eigenvectors from a matrix are 




described in Modern Computing Methods, H.M. Stationery 
Office, 2nd Edn. (1961), Chapter III. 

Ex. 6.3. Suppose that the second largest eigenvector of the 
matrix used in Ex. 6.2 is given by x ' 2 = [x„ x 2 , x 3 , x 4 ]. 
From the equation x 2 x, = using the value found in 
Ex. 6.2 for x, we find 

x 4 = -0-0207*, - 0- 1 836* 2 - 0- 1 265* 3 . (6.25) 

If we write out in full the first linear equation of the matrix 
system Ax = ;.x where A is the matrix of Ex. 6.2, we have 

-0030.y 1 -0-242.v 2 -0-603x 3 +0-178.v 4 = Ax,. 

The unknown x 4 can be eliminated from this equation by 
means of (6.25). In the same way x 4 can be eliminated 
from the left-hand sides of the other three equations in 
Ax = Ax, and finally we obtain 

- 0-0337*, - 0-2747* 2 - 0-6255* 3 = be, , (6.26a) 

-0-2501*, +0-7878*2-0-3927* 3 - A* 2 , (6.26b) 

-0-608ZV, -0-3891*2 + 1 3182* 3 = A* 3 , (6.26c) 

01 236*, - 00899*2 - 008 1 7* 3 = Ax 4 . (6.26d) 

The first three equations represent an eigenvalue problem 
for the 3 x 3 matrix 

B = 

-0-0337 -0-2747 -0-62551 
-0-2501 +0-7878 -0-3927 
L-0-6082 -0-3891 +1-3182 


Suppose that the largest eigenvalue and the corresponding 
eigenvector of the matrix (6.27) has been found by the 
method of § 6.2 or otherwise. From the values of*,, * 2 , * 3 
the value of * 4 can be found from (6.25) and the final values 
of the *, can be checked from (6.26d) or vice-versa. In this 


way we obtain 


A 2 = 1 -6500, x 2 = 





Suppose next that \y u y 2 , .v 3 , v 4 ] denotes the transpose 
of one of the two remaining eigenvectors x 3 or x 4 . This 
vector is orthogonal to x, and x 2 which are now known, so 

-0-3119j,-0-3650>'2-l-l-0000.v 3 -0-0530v 4 = 0, 
0-0207j,+0-1836>'2 + 0-1265v 3 + l-0000.v 4 = 0. (6.28) 
Elimination of j- 4 between these two equations gives 

>- 3 = 0-3087j,+0-3529;' 2 . (6.29) 

On recalling the method of derivation of (6.26) we see that 
(6.26) still hold if the x, are replaced by ;>,. If in (6.26a-c) 
we eliminate y 3 by means of (6.29) we obtain 

-0-2268><,-0-4954>-2 = A;-„ 

-0-3713^, +0-6492J, = Av 2 , 

-0-2013j,+0-0761.V2 - ty* (6.30) 

The first two equations represent an eigenvalue problem for 
the 2x2 matrix 

c = r- 


2268 -0-4954 
3713 0-6492 


The easiest method of procedure is now to solve the 
eigenvalue problem det [C-A/] = by expansion of the 
determinant and direct solution of the resulting quadratic. 
Back-substitution in (6.29) and (6.28) give the remaining 
components of the eigenvectors of the original matrix A 




and (6.30) is a check. This gives, on normalising so that 
the largest elements are unity, 

X 3 = 0-8241, x 3 = 



: ;. 4 = -0-4017, x 4 = 





§6.4. The iterative solution of linear simultaneous 
algebraic equations. It is convenient to consider certain 
iterative methods for the solution of linear equations at this 
point, since the convergence of these procedures can be 
discussed by means of the theory of eigenvalues and eigen- 
vectors. We start by stating two classical iterative pro- 
cedures, considering for simplicity the case of three equations 
in three unknowns. It is assumed that the elements in the 
principal diagonal of the coefficient matrix are non-zero. 
Without loss of generality these can be taken to be unity. 

(a) The equations for the method of simultaneous 

corrections are 

xT»>« -»ia*P -a n x?+b„ 

-v ( / +,) = -a 2 ,x[ r) -a 23 x<? + b 2 , (6.31) 

X 3 — — a 31 \ t — "32*2 +O3. 

(b) The equations for the method of successive correc- 

tions are 

*r»- -a.2*';' -«.3*3 r) +v 

X2' +,> = -fl21* ( i r+ " -«23*3 r) +*>2. (6-32) 

x 3 '+»=- a3l xr»-a 32 x 2 '+» +b 3 . 

The new estimates of the x t are used on the right 
as soon as they are obtained. 




A general class of iterative procedures, which includes 
these methods as special cases, can be discussed in the 
following way. We solve the equations 

Ax = b, (6.33) 

where A is an nxn matrix, by splitting the matrix A in the 
form A = E-F. Then 

Ex = Fx+h, (6.34) 

and an iterative procedure is derived by writing 

Ex (r +" = Fx" + b, (6.35) 

where x ( " is the rfh approximation to the exact value x. 
The iteration is started by setting x {0> equal to an arbitrary 
column matrix. If it is possible to obtain an approximation 
to the exact solution x it may be convenient to take x (0) 
equal to this estimate of a: but often it is sufficient to choose 
x (0 ' = 0. The matrix E should be chosen so that it is easy 
to solve the set of equations (6.35) for x (r+1 \ assuming that 
x (r) is known. This implies that det E is non-zero, and E~ ' 
exists. We have seen in Chapter V that it is easy to solve 
sets of equations if the matrix of coefficients is of lower 
triangular form, and E is often chosen to be of this type. 
An example is the method of successive corrections 
quoted above. 

Instead of working in terms of the estimated values of the 
unknowns it is useful to introduce the corrections 

c ('+l) _ x lr * l) -x (r> . 


As in the discussion of other types of iterative procedure in 
Chapter II, if x (,) tends to the required value x as ;• lends 
to infinity we say that the procedure is convergent. If the 
procedure is convergent 

x = x (0) + £ c<". 

r = 1 


A necessary condition for convergence of the infinite series 




in this equation is that c (r) should tend to zero as r tends 
to infinity. On writing r— 1 in place of r in (6.35) and 
subtracting the result from (6.35) we see that 

c (r+1) = Kc ir \ where K = E~*F. (6.38) 


c (r + » = /r H c <o> (639) 

The convergence of the procedure therefore depends on the 
behaviour of A' f as r tends to infinity. This can be expressed 
in terms of the size of the eigenvalues of K by the following 
argument which should be compared with the analysis at 
the beginning of § 6.2. We consider only the case where 
the matrix K possesses n linearly independent eigenfunctions 
2, with corresponding eigenvalues ). t , so that 

A'z, = ;.,:,., or (F-/.,E)z ( - 0. (6.40) 

It is assumed that there is one eigenvalue of maximum 
modulus, say ). u where ?., is real. The other eigenvalues 
can be complex. From (6.9) the vector c (0) can be written 
in the form 


fl,: l +fl 2 z 2 + ... + fl„:„. 


From (6.39), (6.40), (6.41), 

c ,r+l > = fll ;/ 1 +, 2l +fl 2 A 2 +| Z2 +...+ a „;: n +, v (6.42) 

For large r, R, since )., is the eigenvalue of largest modulus, 

cfr + » « «yg"«t, c (R+s) * X\c w . (6.43) 

If we perform R iterations and then estimate c (R + ,) for 
s>0 by means of this result, equation (6.37) gives 

x = x (0) + £ c"+c*, 


r "TF+ 1 

c"> m c"° 

S-l 1-A, 






We have assumed that \?. t \ < I, and this is obviously a 
necessary condition for convergence of the procedure. 
This formula provides an estimate of the error at any 
stage of the iterative procedure provided that there is one 
real eigenvalue X, which is larger in modulus than the 
others, and the approximate value of ;., is known. From 
(6.43) the value of A, can be estimated from the ratio of 
corresponding elements of successive c (0 . The consistency 
of the values of /., obtained at various stages of the iteration 
is an essential check on the validity of the basic assumption 
that there is one eigenvalue which is larger in modulus 
than the others. 

We define the rate of convergence /> by the formula 

P = - In I A, |, 


where ?., is the eigenvalue of AT with the largest modulus, as 
defined in the last paragraph. To illustrate the significance 
of/) we introduce the error vector 
c w m x (r)_ x 

Subtraction or (6.33) from (6.35) gives 

e (r+ " = Ke< r) = tf r+1 e (0) . 

As in the analysis leading to (6.43) we can show that for 
large r the error in any component of the rth iterate will be 
nearly equal to /.', />, where p is some constant. The number 
of iterations required to reduce the error in this component 
to, say, e is given by 

Taking logarithms 

r In (| p l/e) m In (| p |/«) 

-to fil p 

Hence the number of iterations required is inversely 
proportional to the rate of convergence. The smaller the 
value of | Xi \ the greater the rate of convergence. 




We can compare the rates of convergence of different 
iterative procedures for solving the same set of equations 
if we can compare the magnitudes of the eigenvalues of 
maximum modulus of the corresponding matrices. An 
instructive elementary discussion of rates of convergence 
can be given for the special case of a band or tridiagonal 
matrix, the elements of which are zero except on the 
principal and immediately adjacent diagonals. Without 
loss of generality we can assume that the principal diagonal 
elements are unity so that we can set 

A = L + I+U 

where / is the unit matrix and in the 3x3 case we can set 

A m 

1 t h 

I, 1 

.0 h 


, L = 

ro o o- 

, u = 



1 . 

.0 i 2 o. 



The methods of simultaneous and successive corrections, 
equations (6.31), (6.32), can be written, respectively, 

x ( ' +,) = -(L+U)x< r) +b, (6.46) 

(I+/)x <r+1) = -Ux iri + b. (6.47) 

We examine a generalisation of (6.47) which, as explained 
later, is known as the method of successive over-corrections: 

(L+pl)x (,+ u = -{(l-p)I+U}x ir) + b, (6.48) 

where p is a constant which will be determined so as to yield 
the fastest rate of convergence. The method of successive 
corrections is the special case/? =1. We shall compare the 
rales of convergence of (6.46) and (6.48). From (6.34), 
(6.40), (6.45) this means that we need to compare the 
eigenvalues A, p defined by the equations 

(6.46): [_(L+U)+Xr]u = 0, (6.49) 

(6.48): [{(!-/>)/+ £7}+/i(I + pJ)> = 0. (6.50) 




By means of the following procedure we can show that 
/. and p are directly related. Write (6.50) out in detail. 
Multiply the /th equation by x$ = 1, 2, 3) and set 

P = (l-p)+pp, 

Vj = PjVj, U= 1,2,3), 

where the V t are new unknowns. The a,-, /?, are constants 
which are at our disposal. The equations (6.50) become 

AfcAPk+VtatAT, =0, 

Ph«2PiVi + Pa z p 2 V 2 + u 2 a 2 p 3 V 3 = 0, (6.51) 
pl 2 cc 3 p 2 V 2 + Pa 3 p 3 V 3 = 0. 

We next identify these equations with (6.49). To make 
the terms off the principal diagonal agree, we require 

«lft = «2/»3 = 1. 

a 2 /?,/i = a 3 p 2 p = 1. 

By means of these equations, four of the six a„ fij can be 
expressed in terms of the remaining two. The way in which 
this is done is immaterial. Suppose that we set 

Then (6.51) become 

fetftW*?!* "l^ =o, 

/ 1 K 1 + (« 2 /«,)PK a + H 2 ' / 3 = 0, 

hV 2 H*J*i)Pn- l v 3 = o. 

These are identical with (6.49) if 

«l/«2 = /'*. V=u 
P = (l-p)+pp = f i*A. (6.52) 




First of all we consider the method of successive correc- 
tions, p — 1. Then (6.52) gives p = ?}. This result shows 
that in the case of a band matrix the eigenvalues for the 
method of successive corrections are the squares of those 
for simultaneous corrections. Hence for a given problem 
involving a band matrix the two methods both converge or 
they both diverge. If they converge, the rate of convergence 
is doubled if we use successive instead of simultaneous 

To determine the value of p which gives the fastest rate 
of convergence for any given '/. we discuss the dependence 
of the roots of the quadratic (6.52) for p i on p. When the 
roots are real we need to consider the root of larger modulus, 

I /«* I = (2/>)-{| X \ + Jtt 2 -4p+4p 2 )}, p l <P<P 2 , (6.53) 

where /7,,/j,are the roots of thequadratic4/> 2 -4/7 + ;. 2 = 0. 
ForO<|A|<l we hwe0<p,<±<p 2 <]. When the roots 
are complex 

| /i | = (/>-'- 1), p t <p<p 2 


By drawing graphs of | p | against p for various ;. it is easy 
to see that for convergence we need consider only p>\, 
and then | /< | decreases for \<p<p 2 , and increases for 
p>p 2 - Also the largest | p | corresponds to the largest 
| X |. Hence the rate of convergence is greatest when 



where ). x is the root of maximum modulus for the method of 
simultaneous corrections. From (6.53), (6.54) the eigen- 
value of maximum modulus corresponding to the optimum p 
is given by 

/ll =(2 i ,)- 2 ;. 2 = p-'-i 

= {1-V(1-A,)}{1+V(1-A 2 )}- 





The value of the method can be illustrated by supposing 
that A, = 1-e where e is small so that the methods of 
successive and simultaneous corrections converge slowly. 

lit x {1- V /(2£)}{1 + 7(2 K )}-' * 1-27(26). 

If A, = 0-995, e = 0005, then /i,ss0-8 and a remarkable 
increase in the rate of convergence is obtained. 
We can rearrange (6.48) in the form 

jCr-M) = X (0 +ft) {_ tx (r + l)_ (/+y)x (r)_ fc}) ( g 5?) 

where <u = I /p. If /> = <y = 1 we have the method of 
successive corrections and the bracketed term on the right 
is the correction added to x (r) to give x (r+l) . From (6.55) 
since U, | < | we have ±</><l, or to>], so that in the 
general method the bracketed term is multiplied by a 
factor greater than unity. This is the reason for naming 
the procedure the method of successive over-corrections. 

The following example illustrates numerically some 
aspects of the iterative procedures described in this section. 
An application of these methods is given in Ex. 11.9. 

Ex. 6.4. Obtain the results given in Table 6. 1 . for iterative 
solution of the following equations, working to five 
decimal places. 

20*,- 8x 2 =6, 

- 4.v,+20.v 2 - 4x 3 =6, (6.58) 

- 4a- 2 + 19jc 3 - 4.v 4 = 5, 
- 8x3+20*4 = 1. 

In all cases the iteration was started with the estimates 

20x«°' = 6, 20x 2 °> = 6, I9x 3 ' = 5, 20x 4 °> = 1. 


The eigenvalue of maximum modulus was estimated from 

*■ "{.?.**"}/{,?. 4 r=3 ' 4 - 

For the methods of simultaneous and successive corrections 
the results in Table 6.1 give/, %0-4and A, a 0-2 respectively, 
though the iteration in the latter case has not proceeded far 
enough to give an accurate estimate. The c* in Table 6. 1 

Table 6.1 

Results for the iterative solution o/(6.58). 
AH figures have been multiplied by 10 5 . 








x i 



correct. (6.31) 






491 81 







480 51 







408 89 







213 97 


Success, correct. (6.32) 













480 28 

+ 1 






409 25 







213 69 









492 11 


w= 1044. 





480 28 

+ 1 





409 26 




213 70 

are then obtained from (6.44b). For the method of succes- 
sive over-corrections we use, from (6.55) with A, = 0-4, 

co=\lp = 2{1 + 70-84}" 1 xl 044. 

In practice, of course, when to is estimated, we should not 
start from the beginning again with the original x\ 0) as we 
have done in Table 6.1, but we should start from the 
currently available estimates of the x,. 

In computing by hand, since the result is obtained by 





adding corrections and the accuracy of the extrapolation 
also depends on the accuracy of the corrections, it is 
essential to apply checks. Consider the method of successive 
corrections applied to a 3 x 3 set of equations in (6.32). 
On adding the equations and subtracting from the result 
the corresponding equations with r- 1 in place of r we have 

(l+«2 I +fl3.)ci r+,) +(l+fl32)4 r+1, +<-r ,) 

where c{ r) is the /th component of c (r) . A final check should 
also be applied by carrying out one iteration with the final 
estimates of the x,. 

Examples VI 

Ex. 6.5. Show that the sum of the eigenvalues of A equals 
the sum of the diagonal elements of A and the product of the 
eigenvalues is equal to det A. If the matrix A has distinct 
non-zero eigenvalues A, show that A~\A 2 have eigenvalues 
XT 1 , % respectively, and that A, A~\ A 2 have the same 

Ex. 6.6. Prove the following statements. The matrices 
A -pi and A have the same eigenvectors. Iff/ is an eigen- 
value of A -pi then p + ij is an eigenvalue of A. To find 
the largest eigenvalue of A by the first-order iterative 
procedure of § 6.2 it is in general quicker to iterate with 
A— pi rather than A, for some suitable choice of p. Suppose 
that /., is positive and that the eigenvalues are arranged in 
the order ^>Aa^-..^A, with | A, | > \X 2 | and 
I a, | > \X n |. The fastest rate of convergence is obtained 
when p is chosen equal to iU 2 + /„). For this choice of p 
assumption (iii) of § 6.2 is no longer valid and if it is 
desired to use Aitken's <5 2 -process it may be better to 
choose p = iU 3 + A„). The difficulty in applying these 
results is of course that / 2 , ;. 3 , /„ are usually not known. 




5 6.4 



When using an automatic computer it may be worthwhile 
to find a suitable value of p empirically. If it is known that 
all the roots are positive then a simple choice for /* is 
|Aj where A* is an approximate value for A t found by 
iteration, using A. The above device can be used to find 
the smallest eigenvalue X„ of A by choosing p = 4(A t + X n - 1) 
or iWi+^n-2)- Tnis procedure may not be very satis- 
factory if the lowest roots are close together, which is often 
the case. 

Illustrate these remarks by discussing the case of a fourth- 
order matrix with eigenvalues 4, 3, 2, 1. 

Ex. 6.7. Show that if ;., = -X 2 then, using the same 
notation as in § 6.2, 

«2«+i « A 2 '(a,x,-a 2 x 2 ). 

Deduce that if u p is any element of u p then uJu p - 2 lends to 
A? as p tends to infinity and show how x, and x 2 can be 

Show that if Ui | > U 2 1 and A 2 = -A 3 so that assump- 
tion (iii) of § 6.2 is not true, Aitken's <5 2 -process can be 
applied to even or odd order iterates, but not to any three 
successive iterates. 

Ex. 6.8. If the dominant eigenvalues of A are complex 
conjugate, suppose that they are the roots of A 2 + otX +P = 0. 
Show that for sufficiently large p 

«„+««„_, +/?u p _ 2 = 0. 

Deduce a method for finding a, ft, and hence the dominant 
eigenvalues of A. 

Ex. 6.9. In order to understand the emphasis on 
symmetrical matrices in this chapter we indicate how the 

theory of orthogonality of eigenvectors must be modified 
when A is unsymmetrical. Proofs of the following state- 
ments are left to the reader. A and its transpose A' have 
the same eigenvalues A, and we assume that these are 
distinct. Corresponding to each A, there are eigenvectors 
X, of A and j>, of A', and in general these are different: 

Ax i = AjX|, A y t = A|)>j. 

In general the x, are not orthogonal to each other, nor are 
the y h but 

ylxj = 0, (i + j). 

We say that the x, and y, are biorthogonal. An expansion of 
the form (6.9) exists for an arbitrary vector. Starting from 
arbitrary vectors u , v we can form two sequences 

u P = Au p _ l , t„ =^'b p .,. 
If u p , v p are any elements of u„, Vp then 

lim — ^- = A., lim 

P-co u„_ 

^ = A,. 


p-*to !>p_, 

Further discussion of the unsymmetrical case lies outside 
the scope of this book. 

Ex. 6.10. Show by doing the necessary numerical 
work that when finding the dominant eigenvalue of the 
matrix (6.10) by iterating with u' = [1, 1, I] the rate of 
convergence is more rapid than would be expected from the 
ratio XiUi. Show that if we use «(, =[1, 0, 0] the rate of 
convergence of the eigenvectors is given by X 2 IX { but the 
rate of convergence of the eigenvalue is again more rapid 
titan might be expected. Explain the reasons for these 


Ex. 6.11. Show that the eigenvalues of the matrix 

A = 







-a n -i ■■ 

• -a 2 


are identical with the roots of the algebraic equation 
;." + fl 1 A''- 1 + ...+a n = 0. 

(Many of the remarks made in previous chapters in con- 
nection with finding the roots of algebraic equations apply 
to the determination of eigenvalues.) 

Ex. 6.12. Show that if x is a first-order approximation 
to the eigenvector x, of A, so that x = x, +er where e is 
small, then 

L = 



= X t — Ce 2 + higher-order terms, 

so that L is a second-order approximation to Xf The 
quantity L is called the Rayleigh quotient. 

Ex. 6.13. Write computer programs for finding the 
largest eigenvalue and the corresponding eigenvector of a 
matrix A using the following methods: 

(a) Iterate using (6.16) until successive estimates of A, 
and the components of x, differ by less than a 
quantity e specified in a " Read " statement. If the 
procedure iterates more than 30 times without 
achieving convergence, print out the last two sets 
of results and stop the machine. 




(b) Instead of the criterion for convergence in (a) use 

the criterion that the quantity t p in (6.23) is 
positive until the " noise-level" of the calculation 
is reached (cf. Ex. 3.3). 

(c) Use the method in (a) with the following addition. 

Iterate until successive estimates of X x differ by 
less than 001 X* where X* is the current estimate 
of Xi. Compute the quantity t p in (6.23) and 
continue the iteration with (A-$X* I), (Ex. 6.6). 
Check that the new value of t p is greater than 
the old value. Otherwise continue iterating 

(d) Use the method in (a) with the following addition. 

Compute the quantity t p at each stage (p^3). If 
t p and t p _ j differ by less than 0- 1 t p apply Aitken's 
<5 2 -process and start the iterative procedure again. 

Discuss precautions that could be taken to avoid 
harmful effects due to the noise-level of the calculation in 
cases (a), (c), (d). (The fact that the noise-level can upset 
an orthodox iteration is one of the reasons why we may 
prefer to use the noise-level itself as the criterion for con- 
vergence, as in (b).) 

Ex. 6.14. Iterative methods for the solution of Ax = b 
are particularly useful if A is diagonally dominant i.e. if 
in any row of the matrix the diagonal element is greater 
than the sum of the absolute values of the non-diagonal 
elements. Consider the method of simultaneous correc- 
tions. Without loss of generality we can take the diagonal 
elements to be unity. If we define 

°i *• E I "» l« a = max a " 

j i 

where the sum is taken over all non-diagonal elements only, 
then for diagonal dominance a<\. Let e (r> (/) denote the 
/th element of the error vector c (r) = x-x (r) . If we define 


E, to be the maximum value of e (r, (/) for / = I, 2, ... n then 

l« w coi»iz«u«^ i, co|gB,-i2;ifl U i^B r . 1 . 


E r ^,oE r - x ^o'E . 

Hence the method of simultaneous corrections converges 
for diagonally dominant matrices and for such matrices 
the modulus of the largest eigenvalue is less than a. 

Ex. 6.15. If X is an approximate inverse of A, show 
that if we define 

X,+ , = X,(2I-AX r ), r = 0,1,2,..., 

then as r tends to infinity X r tends to A " '. The procedure is 
second-order. This is an analogue, for matrices, of an 
iterative procedure for finding the inverse of a scalar, 
x = \/a. (Set n = -I in the first formula in Ex. 2.10.) 
A necessary condition for convergence of the scalar pro- 
cedure is that | 1 -ax | < 1. The analogue for the matrix 
procedure is that the largest eigenvalue of the matrix 
I—AXq must be less than unity in modulus. 

Ex. 6.16. Write a computer program for the iterative 
solution of the equations Ax = b using the method of 
successive corrections. Start with the approximation 
x} 0) = bja u . At each stage of the iteration estimate the 
eigenvalue of maximum modulus by means of the formula 

it****-*! M r+,) i 



(c\ r) is the rth component of c (r) defined in (6.36). This 
formula can be justified by (6.43).) When successive 
estimates X ( {\ A ( {* l > are such that 

|4+»-Jl$»| <8\k\ r+i) \. 




where 5 is a given number, say 005, adopt k l {* l) as the 
required estimate of the eigenvalue of maximum modulus 
X\. Correct the estimate of x by means of (6.44). Iterate 
four times, starting from the corrected estimate of x, and 
again apply (6.44). (Do not re-estimate X t .) Repeat this 
procedure until the maximum element of the correction 
vector C defined in (6.44b) is less than a specified number c. 
What provision would you make for non-convergence of 
the procedure and for non-validity of the correction formula 
(6.44)? What information would you print out if the 
procedure does not converge, to indicate why the method 
has failed ? (Ex. 1 1 .9 is concerned with a computer program 
for the method of successive over-corrections.) 




Acceleration of convergence, 34, 

Accuracy, 1-18, 41, 95-6, 117 (see 

also Errors) 
Ailkcn's 82-proccss, 33-6, 133 
Algebraic equations, 19-53, 78 
Automatic computer (sec Com- 
puter, automatic) 
Auxiliary matrix, 109 

Back-substitution, 85, 109, 114 
Bairstow's method, vii, 46-9 
Band matrix, 1 19, 142 
Biorthogonality, 130, 149 

Characteristic roots (sec Eigen- 
Chebyshcv series, 15 
Checks, 16, 23, 82-5, 107, 133, 

Column matrix (vector), 103 
Compact elimination method, 

Complex roots of polynomials, 
eigenvalues, 148 
Computer, automatic, v, 54-5 
and hand computing, 75-8 
programming, 54-79 
programs (see Programs) 

Condition, of algebraic equations, 

of linear equations, 92-8, 117-18 
Convergence of iteration, 21 

acceleration of, 34, 133 

criterion, 21, 65, 72 

order of, 31 

rate of, 26, 28, 141 
Cramer's rule, 80, 124 
Curve fitting, 120 

82-procedure, 33-6, 133 
Dependence of vectors, 128 
Desk machine calculation, 11-15 

and automatic computing, 75-8 

square roots, 30 

Determinants, 80, 88, 93, 94, 1 12, 

120, 124, 128 
Diagonal matrix, 104 
Diagonally dominant matrix, 151 
Digital computer (see Computer, 

Direct and iterative methods, 81 

Eigenvalues and eigenvectors, 

calculation of largest, 129-135 
calculation of subdominant, 

properties, 126-8, 147-150 

Equations (see Algebraic or 

Linear equations) 
Error, viii, 1-18 (see also Round- 
off, Ill-conditioning) 
absolute and relative, 3 
inherent, in linear equations, 

92-8, 117-18 
in iterative procedures, 31-7 
in iterative solution of linear 

equations, 140 
in polynomial equations, 41-6 
in sums and products, 3-5 
Expansion in orthogonal vectors, 

Extrapolation, 33-6, 133, 140 

False position, 51 

Fixed and floating point, 6, 18 

Flow-diagram, 62, 64, 74, 100 

Hand-computing (see also 
compared with automatic, 75-8 
desk machine calculation, 11-15 

llermitian matrix, 130 

algebraic equations, 27, 42-6 
linear equations, 92-8, 117-18 

Inherent error, 92 (see also Ill- 

Inner product, 127 (sec also 
Scalar product) 

Interpolation, linear, 12 

Inverse matrix, 112-18 
computation of, 113-15 

Inverse matrix, iterative method 
for computing, 152 

left and right, 116 
Iterative methods, 19-53 

convergence and divergence, 21 

for inverse matrix, 152 

for largest eigenvalue, 129-135 

for linear equations, 81, 138- 

order of, 31 

programs for, 65-75 

Latent roots (see Eigenvalues) 

Least squares, 120 

Linear dependence and inde- 
pendence, 128 

Linear equations, 80-101, 105- 
118, 121 
compact method, 105-112 
errors, 92-8, 117-18 
iterative methods, 138-147 
successive elimination, 82-7 

Linear interpolation, 12 

Matrix, 102-122 

band or tridiagonal, 119 

eigenvalues of, 125 

inverse, 112-18 

left and right inverse, 1 16 

singular and nonsingular, 113 
Memory, 54 

Mistakes, 15-17, 23 (see also 

Newton-Raphson iterative pro- 
cedure, 26-30, 70-5 





Noisc-lcvel, x, 10,66,72, 151 
Nonsingular matrix, 113 
Normalisation of vector, 128 

Operator methods, viii 

Order, of iterative procedure, 31 

Orthogonal vectors, 127-8 

Pivots, 87, 115 
Polynomial equations, 38-50 
Polynomials, evaluation of, 13-15 
Principal diagonal, 105 
Programming, 54-79 
Program, 55 
for eigenvalues, 150 
for iterative procedures, 65-75, 

for iterative solution of linear 

equations, 153 
for method of successive 

elimination, 98-100, 119 
• for roots of polynomials, 72-3 
Proper values (sec Eigenvalues) 

Scalar product, 103 (see also 

Inner product) 
Scaling, 6, 82, 87, 116 
Significant figures, 2 
Simultaneous equations (see 

linear equations) 
Singular matrix, 113 
Square root, 30, 66 
Square root method for linear 

equations, 119 
Statement, 56 
Statistical theory of roundoff 

error, 9 
Successive corrections, method 

of, 138 
Successive elimination, 82-7 
Symmetrical matrix, 105 

I, definition of, 36 
Transcendental equation, 19-38, 

50-1, 78 
Transpose of matrix, 105 
Triangular matrix, 105, 142 
Tridiagonal matrix, 119, 142 

Rate of convergence, 26, 28, 141 

(see also Convergence) 
Rayleigh quotient, 150 
Residual, 42, 75, 94 
Round-off, 2-10 

noise (see Noisc-lcvel) 
Row matrix (vector), 103 

Unit matrix, 105 

Vector, 103 
expansion in terms of ortho- 
gonal functions, 128 



Numerical Methods: 1 

Programming and 
Algebraic Equations 

It is desirable that students of 
applied mathematics, the physical sciences, and 
engineering should be given an introduction to 
numerical analysis. The aim of this new title in 
the University Mathematical Texts series is to 
provide an elementary exposition of certain 
basic ideas. This has been done by 
considering a strictly limited selection of 
typical methods with particular emphasis on 
fundamentals, including error analysis, 
error estimation, pitfalls and the detection of 
mistakes. An effort has been made to avoid the 
common practice of regarding numerical 
methods as a collection of algorithms or 
recipes. Since the advent of automatic computeis 
has produced a revolution in computing, 
programming for computers has been made an 
integral part of the exposition. 
Volume 1 of this title deals with accuracy and 
error, iterative methods, algebraic 
equations, matrix methods and eigen values. 
Volume 2 deals with numerical methods for 
solving ordinary and partial differential equations. 

Net price 10s6d